Reputation: 313
I'm running a Google Kubernetes Engine with the "private-cluster" option. I've also defined "authorized Master Network" to be able to remotely access the environment - this works just fine. Now I want to setup some kind of CI/CD pipeline using Google Cloud Build - after successfully building a new docker image, this new image should be automatically deployed to GKE. When I first fired off the new pipeline, the deployment to GKE failed - the error message was something like: "Unable to connect to the server: dial tcp xxx.xxx.xxx.xxx:443: i/o timeout". As I had the "authorized master networks" option under suspicion for being the root cause for the connection timeout, I've added 0.0.0.0/0 to the allowed networks and started the Cloud Build job again - this time everything went well and after the docker image was created it was deployed to GKE. Good.
The only problem that remains is that I don't really want to allow the whole Internet being able to access my Kubernetes master - that's a bad idea, isn't it?
Are there more elegant solutions to narrow down access by using allowed master networks and also being able to deploy via cloud build?
Upvotes: 28
Views: 9117
Reputation: 2444
Previously, the official GCP guidance was to setup an HA VPN to facilitate a connection between GKE and a custom Build Pool. In addition to being tedious, complex, and costly (requiring you to reserve 4 static IP addresses!), this method has a serious downside, which was a deal-breaker for me. You must disable the Public IP address for the control plane for any of this setup to accomplish anything, which means you need something like a bastion instance to connect to the control plane afterwards.
There has been an open issue for the past few years which very recently got an update including a tutorial for a much more satisfactory solution: setting up a NAT VM instance for a Custom Build Pool and adding it as an Authorized Network to your GKE cluster.
Having just today followed the referenced tutorial, I can say this method works will relatively little pain.
Upvotes: 0
Reputation: 21
My solution might not be the prettiest but it's kinda straight forward. I'm temporarily white-listing the CloudBuild's public IP to run kubectl to update the deployments.
This is how my cloudbuild.yaml looks like. First we run a container to whitelist the public IP:
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
entrypoint: bash
args:
- '-c'
- |
apt update \
&& apt install -y jq \
&& cd ~ \
&& gcloud container clusters describe CLUSTERNAME --zone=CLUSTERZONE --project GCPPROJECT --format="json(masterAuthorizedNetworksConfig.cidrBlocks)" > ~/manc.json \
&& (jq ".update.desiredMasterAuthorizedNetworksConfig = .masterAuthorizedNetworksConfig | del(.masterAuthorizedNetworksConfig) | .update.desiredMasterAuthorizedNetworksConfig.enabled = \"true\" | .name = \"projects/GCPPROJECT/locations/CLUSTERZONE/clusters/CLUSTERNAME\" | .update.desiredMasterAuthorizedNetworksConfig.cidrBlocks += [{\"cidrBlock\":\"`curl -s ifconfig.me`/32\",\"displayName\":\"CloudBuild tmp\"}]" ./manc.json) > ~/manc2.json \
&& curl -X PUT -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type:application/json" -H "Accept: application/json" --data "$(cat manc2.json)" "https://container.googleapis.com/v1beta1/{projects/GCPPROJECT/locations/CLUSTERZONE/clusters/CLUSTERNAME}"
We can now run whatever kubectl command youd like to run.
This container is going to remove the IP from authoizedNetworks again:
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
entrypoint: bash
args:
- '-c'
- |
apt update \
&& apt install -y jq \
&& cd ~ \
&& gcloud container clusters describe CLUSTERNAME --zone=CLUSTERZONE --project GCPPROJECT --format="json(masterAuthorizedNetworksConfig.cidrBlocks)" > ~/manc.json \
&& (jq ".update.desiredMasterAuthorizedNetworksConfig = .masterAuthorizedNetworksConfig | del(.masterAuthorizedNetworksConfig) | .update.desiredMasterAuthorizedNetworksConfig.enabled = \"true\" | .name = \"projects/GCPPROJECT/locations/CLUSTERZONE/clusters/CLUSTERNAME\" | del(.update.desiredMasterAuthorizedNetworksConfig.cidrBlocks[] | select(.displayName==\"CloudBuild tmp\"))" ./manc.json) > ~/manc2.json \
&& curl -X PUT -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type:application/json" -H "Accept: application/json" --data "$(cat manc2.json)" "https://container.googleapis.com/v1beta1/{projects/GCPPROJECT/locations/CLUSTERZONE/clusters/CLUSTERNAME}"
Please fill out CLUSTERNAME, GCPPROJECT, CLUSTERZONE. Feel free to improve =)
Upvotes: 1
Reputation: 134
Our workaround was to add steps in the CI/CD -- to whitelist the cloudbuild's IP, via Authorized Master Network.
Note: Additional permission for the Cloud Build service account is needed
Kubernetes Engine Cluster Admin
On cloudbuild.yaml, add the whitelist step before the deployment/s.
This step fetches the Cloud Build's IP then updates the container clusters settings;
# Authorize Cloud Build to Access the Private Cluster (Enable Control Plane Authorized Networks)
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
id: 'Authorize Cloud Build'
entrypoint: 'bash'
args:
- -c
- |
apt-get install dnsutils -y &&
cloudbuild_external_ip=$(dig @resolver4.opendns.com myip.opendns.com +short) &&
gcloud container clusters update my-private-cluster --zone=$_ZONE --enable-master-authorized-networks --master-authorized-networks $cloudbuild_external_ip/32 &&
echo $cloudbuild_external_ip
Since the cloud build has been whitelisted, deployments will proceed without the i/o timeout error.
This removes the complexity of setting up VPN / private worker pools.
Disable the Control Plane Authorized Networks after the deployment.
# Disable Control Plane Authorized Networks after Deployment
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
id: 'Disable Authorized Networks'
entrypoint: 'gcloud'
args:
- 'container'
- 'clusters'
- 'update'
- 'my-private-cluster'
- '--zone=$_ZONE'
- '--no-enable-master-authorized-networks'
This approach works well even in cross-project / cross-environment deployments.
Upvotes: 4
Reputation: 1395
I got cloudbuild working with my private GKE cluster following this google document: https://cloud.google.com/architecture/accessing-private-gke-clusters-with-cloud-build-private-pools
This allows me to use cloudbuild and terraform to manage a GKE cluster with authorized network access to control plane enabled. I considered trying to maintain a ridiculous whitelist but that would ultimately defeat the purpose of using authorized network access control to begin with.
I would note that cloudbuild private pools are generally slower than non-private pools. This is due to the server-less nature of private pools. I have not experienced rate limiting so far as others have mentioned.
Upvotes: 2
Reputation: 1297
It is now possible to create a pool of VM's that are connected to you private VPC and can be access from Cloud Build.
Upvotes: 1
Reputation: 1686
Update: I suppose this won't work with production strength for the same reason as @dinvlad's update above, i.e., rate limiting in IAP. I'll leave my original post here because it does solve the network connectivity problem, and illustrates the underlying networking mechanism.
Furthermore, even if we don't use it for Cloud Build, my method provides a way to tunnel from my laptop to a K8s private master node. Therefore, I can edit K8s yaml files on my laptop (e.g., using VS Code), and immediately execute kubectl
from my laptop, rather than having to ship the code to a bastion host and execute kubectl
inside the bastion host. I find this a big booster to development time productivity.
Original answer
================
I think I might have an improvement to the great solution provided by @dinvlad above.
I think the solution can be simplified without installing an HTTP Proxy Server. Still need a bastion host.
I offer the following Proof of Concept (without HTTP Proxy Server). This PoC illustrates the underlying networking mechanism without involving the distraction of Google Cloud Build (GCB). (When I have time in the future, I'll test out the full implementation on Google Cloud Build.)
Suppose:
my-bastion
. It has only private IP but not external IP. The private IP is within the master authorized networks
CIDR of the GKE cluster. Therefore, from within my-bastion
, kubectl
works against the private GKE master node. Because my-bastion
doesn't have an external IP, my home laptop connects to it through IAP.The goal is for me to execute kubectl
on my laptop against that private GKE cluster. From network architecture perspective, my home laptop's position is like the Google Cloud Build server.
Theory: Knowing that gcloud compute ssh
(and the associated IAP) is a wrapper for SSH, the SSH Dynamic Port Forwarding should achieve that goal for us.
Practice:
## On laptop:
LAPTOP~$ kubectl get ns
^C <<<=== Without setting anything up, this hangs (no connectivity to GKE).
## Set up SSH Dynamic Port Forwarding (SOCKS proxy) from laptop's port 8443 to my-bastion.
LAPTOP~$ gcloud compute ssh my-bastion --ssh-flag="-ND 8443" --tunnel-through-iap
In another terminal of my laptop:
## Without using the SOCKS proxy, this returns my laptop's home public IP:
LAPTOP~$ curl https://checkip.amazonaws.com
199.xxx.xxx.xxx
## Using the proxy, the same curl command above now returns a different IP address,
## i.e., the IP of my-bastion.
## Note: Although my-bastion doesn't have an external IP, I have a GCP Cloud NAT
## for its subnet (for purpose unrelated to GKE or tunneling).
## Anyway, this NAT is handy as a demonstration for our curl command here.
LAPTOP~$ HTTPS_PROXY=socks5://127.0.0.1:8443 curl -v --insecure https://checkip.amazonaws.com
* Uses proxy env variable HTTPS_PROXY == 'socks5://127.0.0.1:8443' <<<=== Confirming it's using the proxy
...
* SOCKS5 communication to checkip.amazonaws.com:443
...
* TLSv1.2 (IN), TLS handshake, Finished (20): <<<==== successful SSL handshake
...
> GET / HTTP/1.1
> Host: checkip.amazonaws.com
> User-Agent: curl/7.68.0
> Accept: */*
...
< Connection: keep-alive
<
34.xxx.xxx.xxx <<<=== Returns the GCP Cloud NAT'ed IP address for my-bastion
Finally, the moment of truth for kubectl
:
## On laptop:
LAPTOP~$ HTTPS_PROXY=socks5://127.0.0.1:8443 kubectl --insecure-skip-tls-verify=true get ns
NAME STATUS AGE
default Active 3d10h
kube-system Active 3d10h
Upvotes: 1
Reputation: 1294
Unfortunately, while the below method works, IAP tunnels suffer from rate-limiting, it seems. If there are a lot of resources deployed via kubectl, then the tunnel times out after a while. I had to use another trick, which is to dynamically whitelist Cloud Build IP address via Terraform, and then to apply directly, which works every time.
It is also possible to create an IAP tunnel inside a Cloud Build step:
- id: kubectl-proxy
name: gcr.io/cloud-builders/docker
entrypoint: sh
args:
- -c
- docker run -d --net cloudbuild --name kubectl-proxy
gcr.io/cloud-builders/gcloud compute start-iap-tunnel
bastion-instance 8080 --local-host-port 0.0.0.0:8080 --zone us-east1-b &&
sleep 5
This step starts a background Docker container named kubectl-proxy
in cloudbuild
network, which is used by all of the other Cloud Build steps. The Docker container establishes an IAP tunnel using Cloud Build Service Account identity. The tunnel connects to a GCE instance with a SOCKS or an HTTPS proxy pre-installed on it (an exercise left to the reader).
Inside subsequent steps, you can then access the cluster simply as
- id: setup-k8s
name: gcr.io/cloud-builders/kubectl
entrypoint: sh
args:
- -c
- HTTPS_PROXY=socks5://kubectl-proxy:8080 kubectl apply -f config.yml
The main advantages of this approach compared to the others suggested above:
kubectl-proxy
host can be entirely private, thus maintaining the privacy of the clusterUpvotes: 4
Reputation: 51
We ended up doing the following:
1) Remove the deployment step from cloudbuild.yaml
2) Install Keel inside the private cluster and give it pub/sub editor privileges in the cloud builder / registry project
Keel will monitor changes in images and deploy them automatically based on your settings.
This has worked out great as now we get pushed sha hashed image updates, without adding vms or doing any kind of bastion/ssh host.
Upvotes: 5
Reputation: 45196
It's currently not possible to add Cloud Build machines to a VPC. Similarly, Cloud Build does not announce IP ranges of the build machines. So you can't do this today without creating a "ssh bastion instance" or a "proxy instance" on GCE within that VPC.
I suspect this would change soon. GCB existed before GKE private clusters and private clusters are still a beta feature.
Upvotes: 11