Manu Chadha
Manu Chadha

Reputation: 16755

NodeIP, ClusterIP and LoadBalancer in Kubernetes

My question is built on the question and answers from this question - What's the difference between ClusterIP, NodePort and LoadBalancer service types in Kubernetes?

The question might not be well-formed for some of you.

I am trying to understand the differences between clusterIP, nodePort and Loadbalancer and when to use these with an example. I suppose that my understanding of the following concept is correct K8s consists of the following components

Here is the scenario:

My application has a web server (always returning 200OK) and a database (always returning the same value) for simplicity. Also, say I am on GCP and I make images of webserver and of the database. Each of these will be run in their own respective pods and will have 2 replicas.

I suppose I'll have two clusters (cluster-webserver (node1-web (containing pod1-web), node2-web (containing pod2-web)) and cluster-database (node1-db (containing pod1-db), node2-db (containing pod2-db)). Each node will have its own ip address (node1-webip, node2-webip, node1-dbip, node2-dbip)

A client application (browser) should be able to access the web application from outside web cluster but the database shouldn't be accessible from outside database cluster. However web nodes should be able to access database nodes)

If I use nodePort then K8s will open a port on each of the node and will forward nodeIP/nodePort to cluster IP (on pod)/Cluster Port

As I am on GCP, I can use Loadbalancer for web cluster to get an external IP. Using the external IP, the client application can access the web service

I saw this configuration for a LoadBalancer

spec:
  selector:
    app: MyApp
  ports:
    - protocol: TCP
      port: 80
      targetPort: 9376
  type: LoadBalancer

Questi

Upvotes: 1

Views: 3703

Answers (1)

mario
mario

Reputation: 11158

My question is built on the question and answers from this question - What's the difference between ClusterIP, NodePort and LoadBalancer service types in Kubernetes?

The question might not be well-formed for some of you.

It's ok but in my opinion it's a bit too extensive for a single question and it could be posted as a few separate questions as it touches quite a few different topics.

I am trying to understand the differences between clusterIP, nodePort and Loadbalancer and when to use these with an example. I suppose that my understanding of the following concept is correct K8s consists of the following components

  • Node - A VM or physical machine. Runs kubectl and docker process

Not kubectl but kubelet. You can check it by ssh-ing into your node and runnign systemctl status kubelet. And yes, it runs also some sort of container runtime environment. It doesn't have to be exactly docker.

  • Pod - unit which encapsulates container(s) and volumes (storage). If a pod contains multiple containers then shared volume could be the way for process communication
  • Node can have one or multiple pods. Each pod will have its own IP

That's correct.

  • Cluster - replicas of a Node. Each node in a cluster will contain same pods (instances, type)

Not really. Kubernetes nodes are not different replicas. They are part of the same kubernetes cluster but they are independent instances, which are capable of running your containerized apps. In kubernetes terminology it is called a workload. Workload isn't part of kubernetes cluster, it's something that you run on it. Your Pods can be scheduled on different nodes and it doesn't always have to be an even distribution. Suppose you have kubernetes cluster consisting of 3 worker nodes (nodes on which workload can be scheduled as opposed to master node, that usually runs only kubernetes control plane components). If you deploy your application as a Deployment e.g. 5 different replicas of the same Pod are created. Usually they are scheduled on different nodes, but situation when node1 runs 2 replicas, node2 3 replicas and node3 zero replicas is perfectly possible.

You need to keep in mind that there are different clustering levels. You have your kubernetes cluster which basically is an environment to run your containerized workload.

There are also clusters within this cluster i.e. it is perfectly possible that your workload forms clusters as well e.g. you can have a database deployed as a StatefulSet and it can run in a cluster. In such scenario, different stateful Pods will form members or nodes of such cluster.

Even if your Pods don't comunicate with each other but e.g. serve exactly the same content, Deployment resoure makes sure that there is always certain number of replicas of such a Pod that is up and running. If one kubernetes node for some reason becomes unavailable, it means such Pod needs to be re-scheduled on one of the available nodes. So the replication of your workload isn't achieved by deploying it on different kubernetes nodes but by assuring that certain amout of replicas of a Pod of a certain kind is always up and running, and it may be running on the same as well as on different kubernetes nodes.

Here is the scenario:

My application has a web server (always returning 200OK) and a database (always returning the same value) for simplicity. Also, say I am on GCP and I make images of webserver and of the database. Each of these will be run in their own respective pods and will have 2 replicas.

I suppose I'll have two clusters (cluster-webserver (node1-web (containing pod1-web), node2-web (containing pod2-web)) and cluster-database (node1-db (containing pod1-db), node2-db (containing pod2-db)). Each node will have its own ip address (node1-webip, node2-webip, node1-dbip, node2-dbip)

See above what I wrote about different clustering levels. Clusters formed by your app have nothing to do with kubernetes cluster nor its nodes. And I would say you would rather have 2 different microservices comunicating with each other and in some way also dependent on one another. But yes, you may see your database as a separate db cluster deployed within kubernetes cluster.

A client application (browser) should be able to access the web application from outside web cluster but the database shouldn't be accessible from outside database cluster. However web nodes should be able to access database nodes)

  • Question 1 - Am I correct that if I create a service for web (webServiceName) and a service for database then by default, I'll get only clusterIP and a port (or targetPort).

Yes, ClusterIP service type is often simply called a Service because it's the default Service type. If you don't specify type like in this example, ClusterIP type is created. To understand the difference between port and targetPort you can take a look at this answer or kubernetes official docs.

  • Question 1.2 - Am I correct that clusterIP is an IP assigned to a pod, not the node i.e. in my example, clusterIP gets assigned to pod1-web, not node1-web even though node1 has only pod1.

Basically yes. ClusterIP is one of the things that can be easily misunderstood as it is used to denote also a specific Service type, but in this context yes, it's an internal IP assigned within a kubernetes cluster to a specific resource, in this case to a Pod, but Service object has it's own Cluster IP assigned. Pods as part of kubernetes cluster get their own internal IPs (from kubernetes cluster perspective) - cluster IPs. Nodes can have completely different addressing scheme. They can also be private IPs but they are not cluster IPs, in other words they are not internal kubernetes cluster IPs from cluster perspective. Apart from those external IPs (from kubernetes cluster perspective), kubernetes nodes as legitimate API resources / objects have also their own Cluster IPs assigned.

You can check it by running:

kubectl get nodes --output wide

It will show you both internal and external nodes IPs.

  • Question 1.3 - Am I correct that as cluster IP is accessible from only within the cluster, pod1-web and pod2-web can talk to each other and pod1-db and pod2-db can talk to each other using clusterIP/dns:port or clusterIP/dns:targetPort but web can't talk to database (and vice versa) and external client can't talk to web? Also, the nodes are not accessible using the cluster IP.

Yes, cluster IPs are only accessible from within the cluster. And yes, web pods and db pods can communicate with each other (typically the communication is initiated from web pods) provided you exposed (in your case db pods) via ClusterIP Service. As already mentioned, this type of Service exposes some set of Pods forming one microservice to some other set of Pods which need to comunicate with them and it exposes them only internally, within the cluster so no external client has access to them. You expose your Pods externally by using LoadBalancer, NodePort or in many scenarios via ingress (which under the hood also uses loadbalancer).

this fragment is not very clear to me:

but web can't talk to database (and vice versa) and external client can't talk to web? Also, the nodes are not accessible using the cluster IP.

If you expose your db via Service to be accessible from web Pods, they will have access to it. And if your web Pods are exposed to the external world e.g. via LoadBalancer or NodePort, they will be accessible from outside. And yes, nodes won't be accessible from outside by their cluster IPs as they are private internal IPs of a kubernetes cluster.

  • Question 1.4 - Am I correct that dns i.e. servicename.namespace.svc.cluster.local would map the clusterIP?

Yes, specifically cluster IP of this Service. More on that you can find here.

  • Question 1.5 - For which type of applications I might use only clusterIP? Where multiple instances of an application need to communicate with each other (eg master-slave configuration)?

For something that doesn't need to be exposed externally, like some backend services, that are accessible not directly from outside but through some frontend Pods which process external requests and pass them to the backend afterwards. It may be also used for database pods which practically never should be accessed directly from outside.

If I use nodePort then K8s will open a port on each of the node and will forward nodeIP/nodePort to cluster IP (on pod)/Cluster Port

Yes, in NodePort Service configuration this destination port exposed by a Pod is called targetPort. Somewhere in between there is also a port, wchich refers to a port of the Service itself. So the Service has its ClusterIP (different then backend Pods IPs) and its port which usually is the same as targetPort (targetPort defaults to the value set for port) but can be set to a different value.

  • Question 2 - Can web nodes now access database nodes using nodeIP:nodePort which will route the traffic to database's clusterIP (on pod):clusterport/targertPort?

I think you've mixed it up a bit. If web is something external to the kubernetes cluster, it might have sense to access Pods deployed on kubernetes cluster via nodeIP:nodePort but if it's part of the same kubernetes cluster, it can use simple ClusterIP Service.

( I have read that clusterIP/dns:nodePort will not work).

From the external world of cours it won't work as Cluster IPs are not accessible from outside, they are internal kubernetes IPs. But from within the cluster ? It's perfectly possible. As I said in different part of my answer, kubernetes nodes have also their cluster IPs and it's perfectly possible to access your app on nodePort but from within the cluster i.e. from some other Pod. So when you look at internal (cluster) IP addresses of the nodes in my example it is also perfectly possible to run:

root@nginx-deployment-85ff79dd56-5lhsk:/# curl http://10.164.0.8:32641
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...
  • Question 2.1 - How do I get a node's IP? Is nodeIP the IP I'll get when I run describe pods command?

To check IPs of your nodes run:

kubectl get nodes --output wide

It will show you both their internal (yes, nodes also have their ClusterIPs!) and external IPs.

  • Question 2.2 - Is there a dns equivalent for the node IP as node IP could change during failovers. Or does dns now resolve to the node's IP instead of clusterIP?

No, there isn't. Take a look at What things get DNS names?

  • Question 2.3 - I read that K8s will create endpoints for each service. Is endpoint same as node or is it same as pod? If I run kubectl describe pods or kubectl get endpoints, would I get same IPs)?

No, endpoints is another type of kubernetes API object / resource.

$ kubectl api-resources | grep endpoints
endpoints                         ep                                          true         Endpoints

If you run:

kubectl explain endpoints

you will get it's detailed description:

KIND:     Endpoints
VERSION:  v1

DESCRIPTION:
     Endpoints is a collection of endpoints that implement the actual service.
     Example: Name: "mysvc", Subsets: [
         {
           Addresses: [{"ip": "10.10.1.1"}, {"ip": "10.10.2.2"}],
           Ports: [{"name": "a", "port": 8675}, {"name": "b", "port": 309}]
         },
         {
           Addresses: [{"ip": "10.10.3.3"}],
           Ports: [{"name": "a", "port": 93}, {"name": "b", "port": 76}]
         },
     ]

Usually you don't have to worry about creating endpoints resource as it is created automatically. So to answer your question, endpoints stores information about Pods IPs and keeps track on them as Pods can be destroyed and recreated and their IPs are subject to change. For a Service to keep routing the traffic properly, although Pods IPs change, an object like endpoints must exist which keeps track of those IPs.

You can easily check it by yourself. Simply create a deployment, consisting of 3 Pods and expose it as a simple ClusterIP Service. Check its endpoint object. Then delete one Pod, verify its IP has changed and check again its endpoint object. You can do it by running:

kubectl get ep <endpoints-object-name> -o yaml

or

kubectl describe ep <endpoints-object-name>

So basically different endpoints (as many as backend Pods exposed by a certain Service) are internal (ClusterIP) addresses of Pods exposed by the Service but endpoints object / API resource is a single kubernetes resource that keeps track of those endpoints. I hope this is clear.

As I am on GCP, I can use Loadbalancer for web cluster to get an external IP. Using the external IP, the client application can access the web service

I saw this configuration for a LoadBalancer

spec:
  selector:
    app: MyApp
  ports:
    - protocol: TCP
      port: 80
      targetPort: 9376
  type: LoadBalancer

Questi

  • Question 3 - Is it exposing an external IP and port 80 to outside world? What would be the value of nodePort in this case?

Yes, under the hood a call to GCP API is made so that external http/https loadbalancer with a public IP is created.

Suppose you have a Deployment called nginx-deployment. If you run:

kubectl expose deployment nginx-deployment --type LoadBalancer

It will create a new Service of LoadBalancer type. If you then run:

kubectl get svc

you will see your LoadBalancer Service has both external IP and cluster IP assigned.

NAME               TYPE           CLUSTER-IP    EXTERNAL-IP      PORT(S)        AGE
nginx-deployment   LoadBalancer   10.3.248.43   <some external ip> 80:32641/TCP   102s

If you run:

$ kubectl get svc nginx-deployment
NAME               TYPE           CLUSTER-IP    EXTERNAL-IP      PORT(S)        AGE
nginx-deployment   LoadBalancer   10.3.248.43   <some external ip>   80:32641/TCP 👈  16m

You'll notice that nodePort value for this Service has been also set, in this case to 32641. If you want to dive into it even deeper, run:

kubectl get svc nginx-deployment -o yaml

and you will see it in this section:

...
spec:
  clusterIP: 10.3.248.43
  externalTrafficPolicy: Cluster
  ports:
  - nodePort: 32641 👈
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx
  sessionAffinity: None
  type: LoadBalancer 👈
...

As you can see although the Service type is LoadBalancer it also has its nodePort value set. And you can test that it works by accessing your Deployment using this port, but not on the IP of the LoadBalancer but on IPs of your nodes. I know it may seem pretty confusing as LoadBalancer and NodePort are two different Service types. LB needs to distrute the incoming traffic to some backend Pods (e.g. managed by a Deployment) and needs this nodePort value set in its own specification to be able to route the traffic to Pods scheduled on different nodes. I hope this is a bit clearer now.

Upvotes: 3

Related Questions