This page looks best with JavaScript enabled

GCP Fundamentals Core Infrastructure

 ·  ☕ 27 min read

Thoughts

If you’re looking for a sweeping introduction to all the various products under the Google Cloud umbrella, this is definitely the one you need to go through. The spectrum of offerings and the span that they cover amazed me. I had the knowledge of a few things - like GCE, App Engine, etc. but surely walked away with a lot more knowledge.

My favorite sections were the labs and the sessions around Storage Options. Storage is an area that I’ve always loved working on. Initially, I was intrigued as to why there were so many solutions - Cloud Storage, BigTable, Cloud SQL, Cloud Spanner, Cloud Datastore, and then as I got to know more, they made sense. There was a neat table comparing and contrasting the features and use-cases. The labs were well organized, easy to do, and taught the concepts well. Even if you don’t do the videos and just do the quizzes and the labs, it’s still worth the time. The other cool thing about the course was the availability of slides for each of the modules. They were excellent. Most of the Coursera courses don’t give slides (which is unfortunate), so it was great to see the slides. They were of high quality and very useful.

Also, the presentation and the pro-GCP argument, is quite compelling. Anyone who goes through the material would be convinced that there is no other offering in the world that matches the scale of Google. (Talk about Steve Jobs’ Reality distortion field).


Notes

The Key Concepts at the beginning of every module is from Coursera and the notes are some bullet points that I took down after going through each section.

Module 1 - Introducing Google Cloud Platform

Key Concepts

  • Explain the advantages of Google Cloud Platform.
  • Define the components of Google’s network infrastructure, including: Points of presence, data centers, regions, and zones.
  • Compare Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS).

Welcome to GCP Fundamentals

  • Brian Rice from the Google Cloud training team
  • GCP offers four main kinds of services: compute, storage, big data and machine learning.The course focuses mostly on the first two, together with the topic of networking.

What is cloud computing?

  • NIST (US National Institute of Standards and Technology) definition - Cloud computing is a way of using I.T. that has these 5 equally important traits -
    1. Customers get computing resources on-demand and self-service.
    2. Customers access these resources over the net from anywhere they want.
    3. The provider of those resources has a big pool of them and allocates them to customers out of that pool.
    4. The resources are elastic.
    5. The customers pay only for what they use or reserve as they go.
  • Ref - https://www.nist.gov/publications/nist-definition-cloud-computing

How did we get here?

  • First Wave - Physical/Colocated - Instead of building costly capital intensive data centers, companies could rent space in shared facilities. (While at ebay, I remember visiting one of the data centers and seeing one section of the building that had hardware dedicated to the company.)
  • Second Wave - Virtualized
  • Third Wave - Serverless - Containerization and automated services.

Every company is a data company

  • In future, every company will become a data company, differentiating from their competitors in terms of great software centered on data.

GCP computing architectures

  • IaaS - Provides raw compute, storage, and network similar to the ways in which you handle physical machines. e.g. GCE, EC2. Once you get a GCE instance, you can install OS, and needed libraries, deploy applications after that.
  • PaaS - Provides platform that you can deploy your application on. i.e, provides OS, middleware and the runtime. All that you need to do is to deploy your application. E.g. App Engine, Heroku.
  • SaaS - Provides software in the cloud. E.g. G Suite, Dropbox, Workday.

The Google network

  • Google’s network carries as much as 40 percent of the world’s Internet traffic every day.
  • The network interconnects at more than 90 Internet exchanges and more than 100 PoP worldwide.

GCP regions and zones

  • Regions are independent geographic areas. Locations within regions tend to have round-trip network latencies of under 5 milliseconds on the 95th percentile.
  • A zone is a deployment area within a region.
  • GCP’s services and resources can be multi-regional, regional, or zonal. App Engine, Cloud Datastore, Cloud Storage, BigQuery are examples of multi-regional deployments.
  • As of 2019, Google Cloud Platform has 17 active regions, with more to come.

Environmental responsibility

  • All existing data centers use roughly two percent of the world’s electricity.
  • Google’s data centers were the first to achieve ISO 14001 certification.
  • Google’s data center in Hamina, Finland, has a cooling system that uses seawater from the bay of Finland to reduce energy use.
  • Google is one of the world’s largest corporate purchasers of wind and solar energy.
  • Google has been a hundred percent carbon neutral since 2007.

Google offers customer-friendly pricing

  • Google was the first major Cloud provider to deliver per second billing (!) for IaaS.
  • Many of the best-known GCP services are billed by the second, including GCE and GKE.
  • GCE has automatically applied sustained discounts. i.e., if you run and instance for over 25% in a month, discounts are applied for additional minutes.

Open APIs

  • GCP services are compatible with open source products.
  • Cloud Bigtable uses the interface of the open source database Apache Hbase.
  • Cloud Dataproc offers the open source big data environment Hadoop, as a managed service.
  • TensorFlow is an open source software library for machine learning.
  • Kubernetes gives customers the ability to mix and match microservices running across different clouds.
  • Google Stackdriver lets customers monitor workload across multiple cloud providers.

This is classic Google. Truly altruistic. So many freebees and open standards. One could argue that no company in the world has done so much to lift the average person out of ignorance. All the free things have unlocked many unknown areas of the human potential.

Why choose Google Cloud Platform

  • Variety of solutions - computing, storage, big data, machine learning and application services.
  • Global, cost-effective, open source friendly, designed for security.

Multi-layered security approach

  • Since Google has around 10 services each with over a billion users, you can be sure that security is one of key foundations and is ensured at every letter from bottom up.
  • Hardware: Both the server boards and the networking equipment are custom-designed. Google also designs custom chips, including a hardware security chip.
  • Secure boot stack:Cryptographic signatures over the BIOS, bootloader, kernel, and base operating system image, ensure that the right software stack is booted up.
  • Premises security: Google designs and builds its own data centers, with multiple layers of physical security protections and limited access. The physical security measures also ensured on third-party data centers.
  • Inter-service communication is encrypted: The infrastructure provides cryptographic privacy and integrity for remote procedure call (“RPC”) data on the network.
    Identity Service: The central identity service goes beyond asking just for username & password. It intelligently challenges users for additional information based on risk factors. Users can enable 2FA including devices based on the Universal 2nd Factor (U2F) open standard.
  • Encryption at rest:Encryption (using centrally managed keys) is applied at the storage services layer. Also has hardware encryption support in hard drives and SSDs.
  • Secure Frond end: GFE ensures that all TLS connections are terminated using correct certificates and following best practices such as supporting perfect forward secrecy. Additionally has protection against DoS.
  • DoS protection: Has multi-tier, multi-layer DoS protections that further reduce the risk of any DoS impact on a service running behind a GFE.
  • Intrusion detection: Rules and machine intelligence give operational security engineers warnings of possible incidents.
  • Reducing insider risk: Google aggressively limits and actively monitors the activities of user with elevated access.
  • Employee U2F use: To guard against phishing attacks against Google employees, employee accounts require use of U2F-compatible Security Keys.
  • Software development practices: Google employs central source control and requires two-party review of new code. Google also provides its developers libraries that prevent them from introducing certain classes of security bugs.
  • Ref: https://cloud.google.com/security/infrastructure/design

Budgets and Billing

  • GCP provides four tools to make sure that a customer doesn’t accidentally run up a huge bill. -budgets and alerts, billing export, reports and quotas.
  • Budges & Alerts - You can define budgets per billing account or per GCP project. Budget limits and alerts can be configured.
  • Billing export - lets you store detailed billing information in places where it’s easy to retrieve for more detailed analysis like BigQuery dataset.
  • Reports - Visual tool in the GCP console that allows you to monitor your expenditure.
  • Quotas - There are two types of quotas: rate quotas and allocation quotas. They are designed to prevent the over-consumption of resources, whether because of error or malicious attack.

Module 2 Getting Started with Google Cloud Platform

Key Concepts

  • Identify the purpose of projects, folders, and organization nodes on Google Cloud Platform
  • Describe the purpose of and use cases for Identity and Access Management
  • List the methods of interacting with Google Cloud Platform
  • Build a solution deployment using Cloud Launcher.

Module introduction

  • Projects are the main way you organize the resources you use in GCP
  • Customers use Google Cloud Identity, and Access Management, also called IM, or IAM to control who can do what
  • Different levels of security responsibility : On-prem - everything from physical security to application security is handled by the customer. As you go from IaaS to PaaS to Managed services, the amount handled by cloud provider increases.

The Google Cloud Platform resource hierarchy

  • Resource hierarchy - At the top level is organization node. It can have list of folders which can also have folders. Folders can have projects. Projects have all the resources (VMs, storage buckets, tables, etc.). Policies are inherited downwards in the hierarchy.
  • Projects are the basis for enabling and using GCP services. Managing APIs, billing, etc. happen at the project level.
  • The Cloud Resource Manager provides methods that you can use to programmatically manage your projects in Google Cloud Platform. E.g. Get list of projects, create new, update, delete, etc.
  • Project attributes
    • Project ID - Globally unique, Chosen by you, Immutable
    • Project name - Need not be unique, Chosen by you, Mutable
    • Project number - Globally unique, Assigned by GCP, Immutable

Identity and Access Management (IAM)

  • You can assign policies to resources at a level of granularity you choose. Folders offer flexible identity management. Usually policies are assigned at folder level and the projects and resources inherit the policies.
  • To use folders, you need an organization node at the top of the hierarchy.

IAM roles

  • Organization Policy Administrator - Broad control over all cloud resources.
  • Project creator - fine-grained control over the project resources.
  • Organization node would already exist if you’re a G Suite customer, if not, you can use Google Cloud Identity to create one.
  • The policies implemented at a higher level in this hierarchy can’t take away access that’s granted at lower level
  • IAM defines - “Who”, “Can do what”, “On which resources”
    • “Who” - 4 types of principals -
    • “Can do what” - a defined role that has a list of permissions
      • e.g. InstanceAdmin - create, delete, start, stop and change an instance.
    • “On which resource”
      • Assign role on a specific element of the resource hierarchy - GCE instance instance_a.
  • 3 types of IAM roles - Primitive, Pre-defined, Custom
    • Primitive roles are broad. You apply them to a GCP project, and they affect all resources in that project. They offer fixed, coarse-grained levels of access. E.g. Owner, Editor, and Viewer roles.
    • Predefined roles apply to a particular GCP service in a project. They offer more fine-grained permissions on particular services. E.g. Compute Engine’s instanceAdmin role.
    • Custom roles let you define a precise set of permissions
  • Service accounts -
    • Provide an identity for carrying out server-to-server interactions in a project
    • Used to authenticate from one service to another

Interacting with Google Cloud Platform

  • You can interact with GCP using different kinds of user credentials
    • Gmail accounts or Google Group
    • G Suite user accounts and groups
    • Cloud Identity domain users and groups
  • For option #3, if you already have users are groups defined in an LDAP system (say MS AD), Google Cloud Directory Sync can be used for one-way sync
  • There are 4 ways to interact with GCP
    • Cloud Platform Console - Web user interface
      • Developer tools - Cloud Repositories (git), Cloud Shell
    • Cloud Shell and Cloud SDK - Command-line interface
      • CLI tools - gcloud, gsutil (Cloud Storage), bq (BigQuery),
      • Available as docker image, available via cloud shell
    • Cloud Console Mobile App - For iOS and Android
      • mobile app allows you to start, stop, and SSH into Compute Engine instances, and to see logs from each instance.
    • REST-based API - For custom applications
      • API explorer provides list of APIs. There are also client libraries to make the integration easier.

Cloud Marketplace (formerly Cloud Launcher)

  • Lets you quickly deploy functional software packages that run on Google Cloud Platform
  • Packages offered by Google or third-pary

Module 3 - Virtual Machines in the Cloud

Key Concepts

  • Identify the purpose of and use cases for Google Compute Engine
  • Summarize the various Google Cloud Platform networking and operational tools and services
  • Build a Compute Engine virtual machine using the Google Cloud Platform (GCP) Console.
  • Build a Compute Engine virtual machine using the gcloud command-line interface.

Module Introduction

  • Compute Engine lets you run virtual machines on Google’s global infrastructure.
  • In the module, we look at how GCE works and also focus on virtual networking

Virtual Private Cloud (VPC) Network

  • VPC networks connect your Google Cloud Platform resources to each other and to the internet.
  • You can segment your networks, use firewall rules to restrict access to instances, and create static routes to forward traffic to specific destinations.
  • VPC networks are global, subnets are regional. Subnets can span the zones that make up a region.

Compute Engine

  • Compute Engine lets you create and run virtual machines on Google infrastructure. You configure a virtual machine much like you build out a physical server.
  • You can create a virtual machine instance by using the Google Cloud Platform Console or the gcloud command-line tool.
  • Features -
    • Per-second billing, sustained use discounts, committed use discounts
    • Preemptible instances - you’ve given Compute Engine permission to terminate it if its resources are needed elsewhere
    • High throughput to storage at no extra cost
    • Custom machine types: Only pay for the hardware you need
  • Scale up or scale out. In 2018, the maximum number of virtual CPUs in a VM was 96, and the maximum memory size was in beta at 624.

Important VPC capabilities

  • VPCs have routing tables similar to physical networks. These are used to forward traffic from one instance to another instance within the same network, even across subnetworks and even between GCP zones, without requiring an external IP address. VPCs’ routing tables are built in;
  • VPCs give you a global distributed firewall you can control to restrict access to instances, both incoming and outgoing traffic.
  • VPCs can talk to each other either through shared VPC or VPC peering.
  • Cloud Load Balancing
    • fully distributed, software-defined, managed service for all your traffic.
    • With Cloud Load Balancing, a single anycast IP front-ends all your backend instances in regions around the world.
    • It provides cross-region load balancing, including automatic multi-region failover, which gently moves traffic in fractions if backends become unhealthy.
  • VPC offers a suite of cloud balancing options -
    • Global HTTP(S) - Layer 7 load balancing based on load, Can route different URLs to different back ends
    • Global SSL Proxy - Layer 4 load balancing of non-HTTPS SSL traffic based on load, Supported on specific port numbers
    • Global TCP Proxy - Layer 4 load balancing of non-SSL TCP traffic, Supported on specific port numbers
    • Regional - Load balancing of any traffic (TCP, UDP), Supported on any port number
    • Regional internal - Load balancing of traffic inside a VPC, Use for the internal tiers of multi-tier applications
  • DNS
    • Google Public DNS (8.8.8.8) is a DNS service offered to Internet users worldwide by Google.
    • Google Cloud DNS, is a DNS hosting service, a managed DNS service running on the same infrastructure as Google. Cloud DNS is also programmable. You can publish and manage millions of DNS zones and records using the GCP Console, the command-line interface, or the API.
  • CDN
    • Cloud CDN - Uses Google’s globally distributed edge caches to cache content close to users.
    • Once you’ve set up HTTP(S) Load Balancing, simply enable Cloud CDN with a single checkbox.
  • Interconnect Options
    • VPN - Secure multi-Gbps connection over VPN tunnels
    • Direct Peering - Private connection between you and Google for your hybrid cloud workloads
    • Carrier Peering - Connection through the largest partner network of service providers
    • Dedicated Interconnect - Connect N X 10G transport circuits for private cloud traffic to Google Cloud at Google POPs
    • Partner Interconnect - Connectivity between your on-premises network and your VPC network through a supported service provider

Module 4 - Storage in the Cloud

Key Concepts

  • Summarize the purpose of and use cases for: Cloud Storage, Cloud SQL, Cloud Spanner, and Cloud Bigtable
  • Choose between the various storage options on Google Cloud Platform
  • Build a BigQuery table using data from Cloud Storage.
  • Use SQL queries to analyze data stored in BigQuery.

Introduction to Google Cloud Platform Storage Options

  • Different applications and workloads required different storage database solutions.
  • This module reviews the main storage options in GCP: Cloud Storage, Cloud SQL, Cloud Spanner, Cloud Data Store and Google Big Table.

Cloud Storage

  • Cloud Storage is binary large-object storage
  • Object storage - you say to your storage, “Here, keep this arbitrary sequence of bytes” and the storage lets you address it with a unique key. In Google Cloud Storage, these unique keys are in the form of URLs, which means object storage interacts well with web technologies.
  • Encrypts your data on the server side,
  • Data traveling between a customer’s device and Google is encrypted by default using HTTPS/TLS
  • It can be accessed as a File System via third-party tools such as Cloud Storage FUSE.
  • Offline Media Import/Export - A third-party solution that allows you to load data into Google Cloud Storage by sending your physical media (HDDs, tapes, etc.) to a third-party service provider who uploads data on your behalf.
  • Online Import - Cloud Storage Transfer Service enables you to import large amounts of online data into Google Cloud Storage quickly and cost-effectively.
  • Cloud Storage files are organized into buckets.
    • Bucket attributes -> Globally unique name, Storage class, Geo Location, IAM policies or ACLs (can be overridden for objects), Object versioning setting (store past versions), Object lifecycle management rules (delete after 90 days).
  • Four different types of storage classes: Regional, Multi-regional, Nearline and Coldline.
    • Regional - Accessed frequently in a specific region.
    • Multi-regional - geo-redundant. Most frequently accesses across a broader geo location.
    • Nearline - Accessed less than once a month. Low-cost, highly durable storage service for storing infrequently accessed data.
    • Coldline - Accessed less than once a year. Very-low-cost, highly durable storage service for data archiving, online backup, and disaster recovery.

Cloud Storage interactions

  • There are several ways to bring data into Cloud Storage
    • Online transfer - Self-managed copies using command-line tools (gsutil) or drag-and-drop
    • Storage Transfer Service - Scheduled, managed batch transfers
    • Transfer Appliance - rackable, high-capacity storage server that you lease from Google Cloud. You simply connect it to your network, load it with data, and then ship it to an upload facility where the data is uploaded to Cloud Storage.
  • Other ways to getting data into Cloud Storage
    • BigQuery & Cloud SQL - Import export tables
    • App Engine - Logs, Cloud Datastore backups, images.
    • Compute Engine - instance startup scripts, CE images.

Google Cloud Bigtable

  • Cloud Bigtable - Google’s NoSQL big data database service. Powers many core Google services, including Search, Analytics, Maps, and Gmail.
  • Ideal for applications that need very high throughput and scalability for non-structured key/value data, where each value is typically no larger than 10 MB.
  • Cloud Bigtable is offered through the same open source API as HBase, the native Hadoop database. This enables portability of applications between HBase and Bigtable
  • Attributes - Replicated storage, Data encryption in-flight and at rest, Role-based ACLs
  • Choose Bigtable if data is Big (>1TB), Fast (rapidly changing, time series), NoSQL (not tied to a schema)
  • Access Patterns
    • API - Read and write through data service layer like HBase REST Server, or a Java Server using the HBase client.
    • Streaming - Data can also be streamed in through a variety of popular stream processing frameworks like Cloud Dataflow Streaming, Spark Streaming, and Storm.
    • Batch Processing - Data can also be read from and written to Cloud Bigtable through batch processes like Hadoop MapReduce, Dataflow, or Spark.

Google Cloud SQL and Google Cloud Spanner

  • Cloud SQL - Managed RDBMS
    • Automatic replication - Replication from Cloud SQL master instance, external master instance, external MySQL instance
    • Managed backups - retains up to 7 backups for each instance, which is included in the cost of your instance
    • Vertical scaling (read and write) - Easily scale up to 64 processor cores and more than 100 GB of RAM.
    • Horizontal scaling (read) - Quickly scale out with read replicas.
    • Security - data encrypted at rest
  • Accessible by other GCP services and even external services.
    • App Engine - Access using standard drivers.
    • Compute Engine - Access Cloud SQL instances using an external IP address.
    • External services - Access using standard drivers like Connector/J for Java or MySQLdb for Python.
  • Cloud Spanner - Horizontally scalable managed RDBMS
    • Strong global consistency, including strongly consistent secondary indexes
    • High availability through synchronous and built-in data replication.
    • Suitable for DB sizes exceeding 2 TB and 10s of thousands of R/W per second.

Google Cloud Datastore

  • Horizontally scalable NoSQL DB
  • Supports ACID transactions
  • Schema less
  • Access through RESTful interface
  • High availability of reads & writes

Comparing Storage Options

  • Cloud Storage -
    • Blob store, No transactions, No complex queries, Capacity PB+, per unit 5TB/object.
    • Use case - structured and unstructured binary or object data, like images, large media files and backups.
  • BigTable -
    • NoSQL wide column, transaction for single row, No complex queries, Capacity in PB+, 10MB per cell, 100 MB per row.
    • Use case - analytical data with heavy read and write events, like AdTech, financial or IoT data.
  • Cloud SQL -
    • Relational SQL for OLTP, supports transactions, complex queries, Capacity upto 10 TB, per row size determined by DB engine.
    • Use case - web frameworks and existing applications, like storing user credentials and customer orders.
  • Cloud Spanner -
    • Relational SQL for OLTP, supports transactions, complex queries, Capacity in PB, 10GB per row
    • Large-scale database applications that are larger than 2 TB. For example, for financial trading and e-commerce use cases.
  • Cloud Datastore -
    • No SQL document store, supports transactions, no complex queries, capacity in TB, 1 MB per entity.
    • Use case - Semi-structured application data that is used in App Engine applications.

Module 5 - Containers in the Cloud

Key Concepts

  • Define the concept of a container and identify uses for containers
  • Identify the purpose of and use cases for Google Container Engine and Kubernetes
  • Build a Kubernetes cluster using Kubernetes Engine.
  • Deploy and manage Docker containers in Kubernetes Engine using the kubectl command.

Containers, Kubernetes, and Kubernetes Engine

  • Comparison Overview -
    • Compute Engine, IaaS offering, with access to servers, file systems, and networking.
    • App Engine, PaaS offering, with access to preset runtimes, managed services.
    • Kubernetes Engine is a hybrid which conceptually sits between the two and benefits from both.
  • Why Container ?
    • Compute Engine - the smallest unit of compute is a VM, where you can configure anything from runtime, disk I/O, networking, web server, DB, etc. As the demand increases, you have to copy an entire VM and boot the guest OS for each instance of your app, which can be slow and costly.
    • App Engine - packages everything including services and dependent libraries, scales automatically. But does not have the flexibility of compute engine.
    • Container - gives independent scalability of PaaS and access to OS and hardware like IaaS.
  • Container -
    • An invisible box around your code and its dependencies, with limited access to its own partition of the file system and hardware.
    • Simple to create and starts as a process.
    • It is like virtualizing the OS.
    • So you can go from development, to staging, to production, or from your laptop to the cloud, without changing or rebuilding anything.

Introduction to Kubernetes and GKE

  • Kubernetes makes it easy to orchestrate many containers on many hosts, scale them as microservices, and deploy rollouts and rollbacks.
  • Docker - An open-source tool that defines a format for bundling your application, its dependencies, and machine-specific settings into a container;
  • Google Container Builder - a tool similar to Docker.
  • Use a Dockerfile to specify how your code gets packaged into a container.
  • “docker build” command to build the container. This builds the container and stores it locally as a runnable image.
  • “docker run” command to run the image.
  • Kubernetes is a set of APIs that you can use to deploy containers on a set of nodes called a cluster

Introduction to Hybrid and Multi-Cloud Computing (Anthos)

  • The time required to complete an on-premises upgrade could be anywhere from several months to one or more years. It may also be quite costly.
  • Hybrid/multi-cloud architecture allows you to keep parts of your systems infrastructure on-premises while moving other parts to the Cloud, creating an environment that is uniquely suited to your company’s needs.
  • Anthos - hybrid and multi-cloud solution powered by the latest innovations in distributed systems, and service management software from Google.
  • Anthos relies on Kubernetes and Google Kubernetes engine deployed on-prem.

Module 6 - Applications in the Cloud

Key Concepts

  • Explain the purpose of and use cases for Google App Engine and Google Cloud Datastore
  • Compare the App Engine Standard environment with the App Engine Flexible environment
  • Express the purpose of and use cases for Google Cloud Endpoints
  • Express the purpose of and use cases for Apigee Edge.
  • Preview an App Engine application using Cloud Shell.
  • Launch an App Engine application and then disable it.

Introduction to App Engine

  • App Engine is PaaS solution for building scalable web applications and mobile backends. It manages the infrastructure so that the consumers can focus on building applications.
  • Provides you with hardware and networking infrastructure, built-in services and APIs such as NoSQL datastores, memcache, load balancing, health checks, application logging, and a user authentication API, common to most applications.
  • No servers to maintain. App Engine will scale your application automatically in response to the amount of traffic it receives.
  • Security Scanner automatically scans and detects common web application vulnerabilities.
  • Works with popular development tools such as Eclipse, IntelliJ, Maven, Git, Jenkins, and PyCharm.

Google App Engine Standard Environment

  • Containers are preconfigured with available runtimes which include libraries that support standard APIs
  • Features
    1. Persistent storage
    2. Asynchronous task queues, scheduled tasks for triggering events
    3. Integration with other Google cloud services
  • Limitations
    4. No writing to local file system
    5. Requests timeout at 60s
    6. 3rd part software installations are limited.
  • Example standard workflow -
    7. Develop & test the web application locally
    8. Use the SDK to deploy to App Engine
    9. App Engine automatically scales & reliably serves your web application

Google App Engine Flexible Environment

  • Instead of the sandbox, App Engine Flexible Environment lets you specify the container your application runs in.
  • Comparison with Standard
    • Standard - Instance startup in Milliseconds, SSH access No, Write to local disk No, Support for 3rd-party binaries No, Network access Via App Engine services, Pricing model - Daily free use, after that daily use, pay per instance class, with automatic shutdown
    • Flexible - Instance startup in Minutes, SSH access Yes (needs to be enabled), Write to local disk Yes (ephemeral), Support for 3rd-party binaries Yes, Network access Yes, Pricing model - pay resource allocation per hour, no automatic shutdown.

Google Cloud Endpoints and Apigee Edge

  • Google Cloud Platform provides two API management tools.
  1. Cloud Endpoints - A distributed API management system. It provides an API console, hosting, logging, monitoring, and other features to help you create, share, maintain, and secure your APIs. You can use Cloud Endpoints with any APIs that support the OpenAPI Specification (Swagger spec).
  • Uses distributed Extensible Service Proxy which is a service proxy based on NGINX that runs in its own Docker container for better isolation and scalability.
  • Features
    • User authentication
    • Automated deployment
    • Logging and monitoring
    • API keys
    • Easy integration
  1. Apigee Edge - A platform for developing and managing API proxies. It has a different orientation, though: it has a focus on business problems like rate limiting, quotas, and analytics.

Module 7 - Developing, Deploying and Monitoring in the Cloud

Key Concepts

  • Understand options for software developers to host their source code.
  • Understand the purpose of template-based creation and management of resources.
  • Understand the purpose of integrated monitoring, alerting, and debugging.
  • Build a Deployment Manager deployment.
  • Update a Deployment Manager deployment.
  • View the load on a VM instance using Stackdriver.

Development in the cloud

  • Cloud Source Repositories -
    • Provides Git version control to support collaborative development of any application or service,
    • If you are using the Stackdriver Debugger, you can use Cloud Source Repositories and related tools to view debugging information alongside your code during application runtime
    • CSR also provides a source viewer that you can use to browse and view repository files from within the Google Cloud Platform Console
    • Similar to bitbucket, you can have any number of private repositories
  • Cloud Functions -
    • This is a cool concept. It is a lightweight, event-based, asynchronous compute solution that allows you to create small functions that execute without having to manage a server or a runtime environment.
    • You are billed, to the nearest 100 milliseconds, only while your code is running.
    • Cloud Functions are written in Javascript and execute in a managed Node.js environment on Google Cloud Platform.

Deployment: Infrastructure as code

  • Deployment Manager -
    • An infrastructure management service that automates the creation and management of GCP resources.
    • To use Deployment Manager, you create a template file, using either the YAML markup language or Python, that describes what you want the components of your environment to look like. The Deployment Manager figures out and does the actions needed to create the environment your template describes.

Monitoring: Proactive instrumentation

  • Stackdriver -
    • GCP’s tool for monitoring, logging, and diagnostics.
    • Gives you access to many different kinds of signals from your infrastructure platforms, virtual machines, containers, middleware, and application tier: logs, metrics, traces.
    • Gives you insight into your application’s health, performance, and availability
  • 6 core components of Google Stackdriver: monitoring, logging, trace, error reporting, debugging, profiler.

Module 8 - Big Data and Machine Learning in the Cloud

Key Concepts

  • Understand the purpose of and use cases for the products in the Google Cloud big data platform.
  • Understand the purpose of and use cases for the products in the Google Cloud machine learning platform.
  • Load data into a BigQuery table from Cloud Storage.
  • Use SQL queries to analyze data in BigQuery.

Introduction to Big Data and Machine Learning

  • Google Cloud Big Data solutions - An integrated, serverless platform. “Serverless” means you don’t have to provision compute instances to run your jobs.
    • Cloud Dataproc - Managed Hadoop MapReduce, Spark, Pig, and Hive service
    • Cloud Dataflow - Stream and batch processing; unified and simplified pipelines
    • BigQuery - Analytics database; stream data at 100,000 rows per second
    • Cloud Pub/Sub - Scalable and flexible enterprise messaging
    • Cloud Datalab - Interactive data exploration

Cloud Big Data Platform - Cloud Dataproc

  • A fast, easy, managed way to run Hadoop, Spark, Hive, and Pig on Google Cloud Platform.

  • Easily migrate on-premises Hadoop jobs to the cloud.

  • Quickly analyze data (like log data) stored in Cloud Storage; create a cluster in 90 seconds or less on average, and then delete it immediately.

  • Use Spark/Spark SQL to quickly perform data mining and analysis.

  • Use Spark Machine Learning Libraries (MLlib) to run classification algorithms

Cloud Dataflow

  • Lets you develop and execute a big range of data processing patterns: extract-transform-and-load, batch computation, and continuous computation
  • Features - Resource Management, On Demand, Intelligent Work Scheduling, Auto scaling, Unified Programming Model, Open Source, Monitoring, Integrated, Reliable & Consistent Processing.
  • Use cases -
    • ETL (extract/transform/load) pipelines to move, filter, enrich, shape data
    • Data analysis: batch computation or continuous computation using streaming
    • Orchestration: create pipelines that coordinate services, including external services

BigQuery

  • Fully managed, petabyte scale, low cost analytics data warehouse.
  • Ad-hoc SQL queries on a massive dataset
  • Features - Flexible Data Ingestion, Global Availability, Security and Permissions, Cost Controls, Highly Available, Super Fast Performance, Fully Integrated, Connect with Google Products

Cloud Pub/Sub and Cloud Datalab

  • Cloud Pub/Sub
    • Fully managed real-time messaging service that allows you to send and receive messages between independent applications.
    • Designed to provide “at least once” delivery at low latency with on-demand scalability to 1 million messages per second
    • Features - Highly Scalable, Push and Pull Delivery, Encryption, Replicated Storage, Message Queue, End-to-End Acknowledgement, Fan-out, REST API
  • Cloud Datalab
    • Interactive tool for large-scale data exploration, transformation, analysis, and visualization
    • Integrated, open source. Built on Jupyter
    • Features - Integrated, Multi-language support, Notebook format, pay-per-use pricing, Interactive data visualization, Collaborative, Open Source, Custom deployment.

Google Cloud Machine Learning Platform

  • TensorFlow - Open source tool to build and run neural network models.
    • Wide platform support: CPU or GPU; mobile, server, or cloud.
    • Each Cloud TPU provides 180 teraflops of performance
  • Cloud ML - Fully managed machine learning service
    • Familiar notebook-based developer experience
    • Optimized for Google infrastructure; integrates with BigQuery and Cloud Storage

Machine learning APIs

  • Machine Learning APIs - Pre-trained machine learning models built by Google
    • Cloud Vision API: Identify objects, landmarks, text, and content
    • Cloud Speech API: Stream results in real time, detects 80 languages
    • Cloud Natural language API: Structure, meaning of text
    • Cloud Translation API: Language translation including detection
    • Cloud Video Intelligence API: Annotate the contents of videos, Detect scene changes, Flag inappropriate content

References

Share on

Robinson Raju
WRITTEN BY
Robinson Raju
Bibliophile, Friend, Optimist


What's on this Page