Mizaru
Mizaru

Reputation: 313

Google Cloud Build deploy to GKE Private Cluster

I'm running a Google Kubernetes Engine with the "private-cluster" option. I've also defined "authorized Master Network" to be able to remotely access the environment - this works just fine. Now I want to setup some kind of CI/CD pipeline using Google Cloud Build - after successfully building a new docker image, this new image should be automatically deployed to GKE. When I first fired off the new pipeline, the deployment to GKE failed - the error message was something like: "Unable to connect to the server: dial tcp xxx.xxx.xxx.xxx:443: i/o timeout". As I had the "authorized master networks" option under suspicion for being the root cause for the connection timeout, I've added 0.0.0.0/0 to the allowed networks and started the Cloud Build job again - this time everything went well and after the docker image was created it was deployed to GKE. Good.

The only problem that remains is that I don't really want to allow the whole Internet being able to access my Kubernetes master - that's a bad idea, isn't it?

Are there more elegant solutions to narrow down access by using allowed master networks and also being able to deploy via cloud build?

Upvotes: 28

Views: 9117

Answers (9)

DragonBobZ
DragonBobZ

Reputation: 2444

Previously, the official GCP guidance was to setup an HA VPN to facilitate a connection between GKE and a custom Build Pool. In addition to being tedious, complex, and costly (requiring you to reserve 4 static IP addresses!), this method has a serious downside, which was a deal-breaker for me. You must disable the Public IP address for the control plane for any of this setup to accomplish anything, which means you need something like a bastion instance to connect to the control plane afterwards.

There has been an open issue for the past few years which very recently got an update including a tutorial for a much more satisfactory solution: setting up a NAT VM instance for a Custom Build Pool and adding it as an Authorized Network to your GKE cluster.

Having just today followed the referenced tutorial, I can say this method works will relatively little pain.

Upvotes: 0

Tim
Tim

Reputation: 21

My solution might not be the prettiest but it's kinda straight forward. I'm temporarily white-listing the CloudBuild's public IP to run kubectl to update the deployments.

This is how my cloudbuild.yaml looks like. First we run a container to whitelist the public IP:

- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
  entrypoint: bash
  args:
  - '-c'
  - |
    apt update \
    && apt install -y jq \
    && cd ~ \
    && gcloud container clusters describe CLUSTERNAME --zone=CLUSTERZONE --project GCPPROJECT --format="json(masterAuthorizedNetworksConfig.cidrBlocks)" > ~/manc.json \
    && (jq ".update.desiredMasterAuthorizedNetworksConfig = .masterAuthorizedNetworksConfig | del(.masterAuthorizedNetworksConfig) | .update.desiredMasterAuthorizedNetworksConfig.enabled = \"true\" | .name = \"projects/GCPPROJECT/locations/CLUSTERZONE/clusters/CLUSTERNAME\" | .update.desiredMasterAuthorizedNetworksConfig.cidrBlocks += [{\"cidrBlock\":\"`curl -s ifconfig.me`/32\",\"displayName\":\"CloudBuild tmp\"}]" ./manc.json) > ~/manc2.json \
    && curl -X PUT -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type:application/json" -H "Accept: application/json" --data "$(cat manc2.json)" "https://container.googleapis.com/v1beta1/{projects/GCPPROJECT/locations/CLUSTERZONE/clusters/CLUSTERNAME}"

We can now run whatever kubectl command youd like to run.

This container is going to remove the IP from authoizedNetworks again:

- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
  entrypoint: bash
  args:
  - '-c'
  - |
    apt update \
    && apt install -y jq \
    && cd ~ \
    && gcloud container clusters describe CLUSTERNAME --zone=CLUSTERZONE --project GCPPROJECT --format="json(masterAuthorizedNetworksConfig.cidrBlocks)" > ~/manc.json \
    && (jq ".update.desiredMasterAuthorizedNetworksConfig = .masterAuthorizedNetworksConfig | del(.masterAuthorizedNetworksConfig) | .update.desiredMasterAuthorizedNetworksConfig.enabled = \"true\" | .name = \"projects/GCPPROJECT/locations/CLUSTERZONE/clusters/CLUSTERNAME\" | del(.update.desiredMasterAuthorizedNetworksConfig.cidrBlocks[] | select(.displayName==\"CloudBuild tmp\"))" ./manc.json) > ~/manc2.json \
    && curl -X PUT -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type:application/json" -H "Accept: application/json" --data "$(cat manc2.json)" "https://container.googleapis.com/v1beta1/{projects/GCPPROJECT/locations/CLUSTERZONE/clusters/CLUSTERNAME}"

Please fill out CLUSTERNAME, GCPPROJECT, CLUSTERZONE. Feel free to improve =)

Upvotes: 1

cypher15
cypher15

Reputation: 134

Our workaround was to add steps in the CI/CD -- to whitelist the cloudbuild's IP, via Authorized Master Network.

Note: Additional permission for the Cloud Build service account is needed

Kubernetes Engine Cluster Admin

On cloudbuild.yaml, add the whitelist step before the deployment/s.

This step fetches the Cloud Build's IP then updates the container clusters settings;

# Authorize Cloud Build to Access the Private Cluster (Enable Control Plane Authorized Networks)
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
  id: 'Authorize Cloud Build'
  entrypoint: 'bash'
  args:
    - -c
    - |
      apt-get install dnsutils -y &&
      cloudbuild_external_ip=$(dig @resolver4.opendns.com myip.opendns.com +short) &&
      gcloud container clusters update my-private-cluster --zone=$_ZONE --enable-master-authorized-networks --master-authorized-networks $cloudbuild_external_ip/32 &&
      echo $cloudbuild_external_ip

Since the cloud build has been whitelisted, deployments will proceed without the i/o timeout error.

This removes the complexity of setting up VPN / private worker pools.

Disable the Control Plane Authorized Networks after the deployment.

# Disable Control Plane Authorized Networks after Deployment
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
  id: 'Disable Authorized Networks'
  entrypoint: 'gcloud'
  args:
    - 'container'
    - 'clusters'
    - 'update'
    - 'my-private-cluster'
    - '--zone=$_ZONE'
    - '--no-enable-master-authorized-networks'

This approach works well even in cross-project / cross-environment deployments.

Upvotes: 4

siesta
siesta

Reputation: 1395

I got cloudbuild working with my private GKE cluster following this google document: https://cloud.google.com/architecture/accessing-private-gke-clusters-with-cloud-build-private-pools

This allows me to use cloudbuild and terraform to manage a GKE cluster with authorized network access to control plane enabled. I considered trying to maintain a ridiculous whitelist but that would ultimately defeat the purpose of using authorized network access control to begin with.

I would note that cloudbuild private pools are generally slower than non-private pools. This is due to the server-less nature of private pools. I have not experienced rate limiting so far as others have mentioned.

Upvotes: 2

p13rr0m
p13rr0m

Reputation: 1297

It is now possible to create a pool of VM's that are connected to you private VPC and can be access from Cloud Build.

Quickstart

Upvotes: 1

Vincent Yin
Vincent Yin

Reputation: 1686

Update: I suppose this won't work with production strength for the same reason as @dinvlad's update above, i.e., rate limiting in IAP. I'll leave my original post here because it does solve the network connectivity problem, and illustrates the underlying networking mechanism.

Furthermore, even if we don't use it for Cloud Build, my method provides a way to tunnel from my laptop to a K8s private master node. Therefore, I can edit K8s yaml files on my laptop (e.g., using VS Code), and immediately execute kubectl from my laptop, rather than having to ship the code to a bastion host and execute kubectl inside the bastion host. I find this a big booster to development time productivity.

Original answer

================

I think I might have an improvement to the great solution provided by @dinvlad above.

I think the solution can be simplified without installing an HTTP Proxy Server. Still need a bastion host.

I offer the following Proof of Concept (without HTTP Proxy Server). This PoC illustrates the underlying networking mechanism without involving the distraction of Google Cloud Build (GCB). (When I have time in the future, I'll test out the full implementation on Google Cloud Build.)

Suppose:

  1. I have a GKE cluster whose master node is private, e.g., having an IP address 10.x.x.x.
  2. I have a bastion Compute Instance named my-bastion. It has only private IP but not external IP. The private IP is within the master authorized networks CIDR of the GKE cluster. Therefore, from within my-bastion, kubectl works against the private GKE master node. Because my-bastion doesn't have an external IP, my home laptop connects to it through IAP.
  3. My laptop at home, with my home internet public IP address, doesn't readily have connectivity to the private GKE master node above.

The goal is for me to execute kubectl on my laptop against that private GKE cluster. From network architecture perspective, my home laptop's position is like the Google Cloud Build server.

Theory: Knowing that gcloud compute ssh (and the associated IAP) is a wrapper for SSH, the SSH Dynamic Port Forwarding should achieve that goal for us.

Practice:

## On laptop:
LAPTOP~$ kubectl get ns
^C            <<<=== Without setting anything up, this hangs (no connectivity to GKE).

## Set up SSH Dynamic Port Forwarding (SOCKS proxy) from laptop's port 8443 to my-bastion.
LAPTOP~$ gcloud compute ssh my-bastion --ssh-flag="-ND 8443" --tunnel-through-iap

In another terminal of my laptop:

## Without using the SOCKS proxy, this returns my laptop's home public IP:
LAPTOP~$ curl https://checkip.amazonaws.com
199.xxx.xxx.xxx

## Using the proxy, the same curl command above now returns a different IP address, 
## i.e., the IP of my-bastion. 
## Note: Although my-bastion doesn't have an external IP, I have a GCP Cloud NAT 
## for its subnet (for purpose unrelated to GKE or tunneling).
## Anyway, this NAT is handy as a demonstration for our curl command here.
LAPTOP~$ HTTPS_PROXY=socks5://127.0.0.1:8443 curl -v --insecure https://checkip.amazonaws.com
* Uses proxy env variable HTTPS_PROXY == 'socks5://127.0.0.1:8443'  <<<=== Confirming it's using the proxy
...
* SOCKS5 communication to checkip.amazonaws.com:443
...
* TLSv1.2 (IN), TLS handshake, Finished (20):             <<<==== successful SSL handshake
...
> GET / HTTP/1.1
> Host: checkip.amazonaws.com
> User-Agent: curl/7.68.0
> Accept: */*
...
< Connection: keep-alive
<
34.xxx.xxx.xxx            <<<=== Returns the GCP Cloud NAT'ed IP address for my-bastion 

Finally, the moment of truth for kubectl:

## On laptop:
LAPTOP~$ HTTPS_PROXY=socks5://127.0.0.1:8443 kubectl --insecure-skip-tls-verify=true get ns
NAME              STATUS   AGE
default           Active   3d10h
kube-system       Active   3d10h

Upvotes: 1

dinvlad
dinvlad

Reputation: 1294

Updated answer (02/22/2021)

Unfortunately, while the below method works, IAP tunnels suffer from rate-limiting, it seems. If there are a lot of resources deployed via kubectl, then the tunnel times out after a while. I had to use another trick, which is to dynamically whitelist Cloud Build IP address via Terraform, and then to apply directly, which works every time.

Original answer

It is also possible to create an IAP tunnel inside a Cloud Build step:

- id: kubectl-proxy
  name: gcr.io/cloud-builders/docker
  entrypoint: sh
  args:
  - -c
  - docker run -d --net cloudbuild --name kubectl-proxy
      gcr.io/cloud-builders/gcloud compute start-iap-tunnel
      bastion-instance 8080 --local-host-port 0.0.0.0:8080 --zone us-east1-b &&
    sleep 5

This step starts a background Docker container named kubectl-proxy in cloudbuild network, which is used by all of the other Cloud Build steps. The Docker container establishes an IAP tunnel using Cloud Build Service Account identity. The tunnel connects to a GCE instance with a SOCKS or an HTTPS proxy pre-installed on it (an exercise left to the reader).

Inside subsequent steps, you can then access the cluster simply as

- id: setup-k8s
  name: gcr.io/cloud-builders/kubectl
  entrypoint: sh
  args:
  - -c
  - HTTPS_PROXY=socks5://kubectl-proxy:8080 kubectl apply -f config.yml

The main advantages of this approach compared to the others suggested above:

  • No need to have a "bastion" host with a public IP - kubectl-proxy host can be entirely private, thus maintaining the privacy of the cluster
  • Tunnel connection relies on default Google credentials available to Cloud Build, and as such there's no need to store/pass any long-term credentials like an SSH key

Upvotes: 4

Farhan Husain
Farhan Husain

Reputation: 51

We ended up doing the following:

1) Remove the deployment step from cloudbuild.yaml

2) Install Keel inside the private cluster and give it pub/sub editor privileges in the cloud builder / registry project

Keel will monitor changes in images and deploy them automatically based on your settings.

This has worked out great as now we get pushed sha hashed image updates, without adding vms or doing any kind of bastion/ssh host.

Upvotes: 5

ahmet alp balkan
ahmet alp balkan

Reputation: 45196

It's currently not possible to add Cloud Build machines to a VPC. Similarly, Cloud Build does not announce IP ranges of the build machines. So you can't do this today without creating a "ssh bastion instance" or a "proxy instance" on GCE within that VPC.

I suspect this would change soon. GCB existed before GKE private clusters and private clusters are still a beta feature.

Upvotes: 11

Related Questions