How to Deploy a Multi-cluster Service Mesh on GKE with Anthos

In this article, I am going to explain step-by-step how I deployed a multi-cluster, multi-region service mesh using Anthos Service Mesh. During my proof of concept, I read the documentation at https://cloud.google.com/service-mesh/docs/install, but none of the guides covered exactly my requirements, which are:

Multi-cluster, multi-region service mesh
Google-managed Istio control plane (for added resiliency, and to minimize my effort)
Google-managed CA certificates for Istio mTLS

Deploy the GKE clusters

Deploy the two GKE clusters. I called them asm-a and asm-b (easier to remember) and deployed them in two different regions (us-west2-a and us-central1-a). Because Anthos Service Mesh requires nodes to have at least 4 vCPUs (and a few more requirements, see the complete list at): https://cloud.google.com/service-mesh/docs/scripted-install/asm-onboarding), use at least the e2-standard-4 machines.

As preparation work, store the Google Cloud Project ID in an environment variable so that the remaining commands can be copied and pasted directly.

export PROJECT_ID=$(gcloud info --format='value(config.project)')

Then, to deploy the clusters, run:

gcloud container clusters create asm-a --zone us-west2-a --machine-type "e2-standard-4" --disk-size "100" --num-nodes "2" --workload-pool=${PROJECT_ID}.svc.id.goog --async

gcloud container clusters create asm-b --zone us-central1-a --machine-type "e2-standard-4" --disk-size "100" --num-nodes "2" --workload-pool=${PROJECT_ID}.svc.id.goog --async

The commands are also enabling Workload Identity, which you can read more about at: https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity.

Fetch the credentials to the clusters

Once the clusters have been created, fetch the credentials needed to connect to them via kubectl. Use the following commands:

gcloud container clusters get-credentials asm-a --zone us-west2-a --project ${PROJECT_ID}
gcloud container clusters get-credentials asm-b --zone us-central1-a --project ${PROJECT_ID}

Easily switch kubectl context with kubectx

kubectx makes it easy to switch between clusters and namespaces in kubectl (also known as context) by creating a memorable alias for them (in this case, asma and asmb). Learn more about the tool at: https://github.com/ahmetb/kubectx.

kubectx asma=gke_${PROJECT_ID}_us-west2-a_asm-a
kubectx asmb=gke_${PROJECT_ID}_us-central1-a_asm-b

Set the Mesh ID label for the clusters

Set the mesh_id label on the clusters before installing Anthos Service Mesh, which is needed by Anthos to identify which clusters belong to which mesh. The mesh_id is always in the format proj-<your-project-number>, and the project number for the project can be found by running:

gcloud projects list

Use these commands to create the mesh_id label on both clusters (replace <your-project-number> with the project number found with the previous command:

export MESH_ID="proj-<your-project-number>"
gcloud container clusters update asm-a --region us-west2-a --project=${PROJECT_ID} --update-labels=mesh_id=${MESH_ID}

gcloud container clusters update asm-b --region us-central1-a --project=${PROJECT_ID} --update-labels=mesh_id=${MESH_ID}

Enable StackDriver

Enable StackDriver on the clusters to be able to see logs, should anything go wrong during the setup!

gcloud container clusters update asm-a --region us-west2-a --project=${PROJECT_ID} --enable-stackdriver-kubernetes

gcloud container clusters update asm-b --region us-central1-a --project=${PROJECT_ID} --enable-stackdriver-kubernetes

Create firewall rules for cross-region communication

The clusters live in different regions, therefore a new firewall rule must be created to allow communication between them and their pods. Bash frenzy incoming!

ASMA_POD_CIDR=$(gcloud container clusters describe asm-a --zone us-west2-a --format=json | jq -r '.clusterIpv4Cidr')
ASMB_POD_CIDR=$(gcloud container clusters describe asm-b --zone us-central1-a --format=json | jq -r '.clusterIpv4Cidr')
ASMA_PRIMARY_CIDR=$(gcloud compute networks subnets describe default --region=us-west2 --format=json | jq -r '.ipCidrRange')
ASMB_PRIMARY_CIDR=$(gcloud compute networks subnets describe default --region=us-central1 --format=json | jq -r '.ipCidrRange')
ALL_CLUSTER_CIDRS=$ASMA_POD_CIDR,$ASMB_POD_CIDR,$ASMA_PRIMARY_CIDR,$ASMB_PRIMARY_CIDR

gcloud compute firewall-rules create asm-multicluster-rule \
    --allow=tcp,udp,icmp,esp,ah,sctp \
    --direction=INGRESS \
    --priority=900 \
    --source-ranges="${ALL_CLUSTER_CIDRS}" \
    --target-tags="${ALL_CLUSTER_NETTAGS}" --quiet

Install Anthos Service Mesh

First, install the required local tools as explained here: https://cloud.google.com/service-mesh/docs/scripted-install/asm-onboarding#installing\_required\_tools.

The install_asm tool will install Anthos Service Mesh on the clusters. Pass these options to fulfil the initial requirements:

--managed: Google-managed Istio control plane
--ca mesh_ca: Google-managed CA certificates for Istio mTLS
--enable_registration: automatically registers the clusters with Anthos (it can also be done manually later)
--enable_all: all Google APIs required by the installation will be enabled automatically by the script

./install_asm --project_id ${PROJECT_ID} --cluster_name asm-a --cluster_location us-west2-a --mode install --managed --ca mesh_ca --output_dir asma --enable_registration --enable_all

./install_asm --project_id ${PROJECT_ID} --cluster_name asm-b --cluster_location us-central1-a --mode install --managed --ca mesh_ca --output_dir asmb --enable_registration --enable_all

Configure endpoint discovery between clusters

Endpoint discovery makes it possible for the clusters to communicate with each other, for example, it enables discovery of service endpoints between the clusters.

Install the required local tools as explained here: https://cloud.google.com/service-mesh/docs/downloading-istioctl, then run the following commands:

istioctl x create-remote-secret --context=asma --name=asm-a| kubectl apply -f - --context=asmb

istioctl x create-remote-secret --context=asmb --name=asm-b| kubectl apply -f - --context=asma

Testing the service mesh

Anthos Service Mesh is now ready! Let's deploy a sample application to verify cross-cluster traffic and failovers.

Create the namespace for the Hello World app

Create a new namespace on both clusters and enable automatic Istio sidecar injection for both of them. Since the Istio control plane is managed by Google, the istio-injection- istio.io/rev= label is set to asm-managed.

kubectl create --context=asma namespace sample

kubectl label --context=asma namespace sample istio-injection- istio.io/rev=asm-managed --overwrite

kubectl create --context=asmb namespace sample

kubectl label --context=asmb namespace sample istio-injection- istio.io/rev=asm-managed --overwrite

Create the Hello World service

Deploy the services for the Hello World app on both clusters with:

kubectl create --context=asma -f https://raw.githubusercontent.com/istio/istio/1.9.5/samples/helloworld/helloworld.yaml -l service=helloworld -n sample

kubectl create --context=asmb -f https://raw.githubusercontent.com/istio/istio/1.9.5/samples/helloworld/helloworld.yaml -l service=helloworld -n sample

Create the Hello World deployment

Deploy the Hello World sample app, which provides an endpoint that will return the version number of the application (the version number is different in the two clusters) and an Hello World message to go with it.

kubectl create --context=asma -f https://raw.githubusercontent.com/istio/istio/1.9.5/samples/helloworld/helloworld.yaml -l version=v1 -n sample

kubectl create --context=asmb -f https://raw.githubusercontent.com/istio/istio/1.9.5/samples/helloworld/helloworld.yaml -l version=v2 -n sample

Deploy the Sleep pod

The Sleep application simulates downtime. Let's use it to test the resilience of the service mesh! To deploy the Sleep application, use:

kubectl apply --context=asma -f https://raw.githubusercontent.com/istio/istio/1.9.5/samples/sleep/sleep.yaml -n sample

kubectl apply --context=asmb -f https://raw.githubusercontent.com/istio/istio/1.9.5/samples/sleep/sleep.yaml -n sample

Verify cross-cluster traffic

To verify that cross-cluster load balancing works as expected (read as: can the service mesh survive regional failures?), call the HelloWorld service several times using the Sleep pod. To ensure load balancing is working properly, call the HelloWorld service from all clusters in your deployment.

kubectl exec --context=asma -n sample -c sleep "$(kubectl get pod --context=asma -n sample -l app=sleep -o jsonpath='{.items[0].metadata.name}')" -- curl -sS helloworld.sample:5000/hello

kubectl exec --context=asmb -n sample -c sleep "$(kubectl get pod --context=asmb -n sample -l app=sleep -o jsonpath='{.items[0].metadata.name}')" -- curl -sS helloworld.sample:5000/hello

Repeat this request several times and verify that the HelloWorld version should toggle between v1 and v2. This means the request is relayed to the healthy cluster when the other one is not responding!

Summary

In this article, I have explained how I deployed Anthos Service Mesh on two GKE clusters in different regions with Google-managed Istio control plane and CA certificates. Anthos Service Mesh makes it simple to deploy a multi-cluster service mesh because most of the complexity of Istio is now managed by Google.