Lukas Morkūnas
Lukas Morkūnas

Reputation: 23

Custom load balancing in Kubernetes

We are developing a simulation software which is deployed and scaled between multiple pods using kubernetes. When a user makes a simulation request, a pod is selected which starts doing the job and is considered as busy. When another user makes a simulation request, it should be routed to the next free pod. Currently, a busy pod is often selected (even though there are free ones) as kubernetes does not know which pods are busy/free.

Is it possible to balance requests in such way that a free pod is always selected? (Assuming that each app instance inside a pod exposes an HTTP endpoint which tells it's current busy/free status)

Upvotes: 2

Views: 974

Answers (2)

GPuri
GPuri

Reputation: 843

Another solution to the one suggested in above answer is to make use of Headless Service and some reverse-proxy/load balancer like HAProxy(I used it).

So you can have a headless service or change the existing service which is exposing your pods like below. It is the configuration clusterIP:None which will be responsible for creating a headless service.

apiVersion: v1
kind: Service
metadata:
  name: some-service
  namespace: dev
  labels:
    app: <labels here>
spec:
  ports:
    - port: <port>
      targetPort: <tport>
  selector:
    app: <selectors here>
  clusterIP: None

Then you can deploy HAProxy like below

apiVersion: apps/v1
kind: Deployment
metadata:
  name: haproxy-headless
  namespace: dev
  labels:
    app: haproxy-headless
spec:
  replicas: 1
  selector:
    matchLabels:
      app: haproxy-headless
  template:
    metadata:
      labels:
        app: haproxy-headless
    spec:
      containers:
      - name: haproxy-headless
        image: haproxy:1.9
        ports:
        - containerPort: 8888
          name: management
        - containerPort: 8085
          name: http
        volumeMounts:
         - name: haproxy-config
           mountPath: "/usr/local/etc/haproxy/haproxy.cfg"
      volumes:
        - name: haproxy-config
          configMap:
            name: haproxy-config

ConfigMap for Haproxy:

apiVersion: v1
kind: ConfigMap
metadata:
  name: haproxy-config
  namespace: dev
data:
   haproxyconfig.cfg: |
     
     defaults
       mode tcp
       log global
       option httplog
       retries 5
       timeout connect 10s
       timeout client 300s
       timeout server 300s 
     
     resolvers test_resolver
       nameserver dns1 <your dns server address>
       resolve_retries 30
       timeout retry 2s
       hold valid 10s
       accepted_payload_size 8192
     
     frontend stats
        mode http
        bind :8888
        stats enable
        stats uri /
        stats refresh 15s
     
     frontend test_fe
       bind :8085
       timeout client 60s
       default_backend test_be
       
     backend test_be
       balance leastconn
       server-template srv 7 <service-name>.<your namespace>.svc.cluster.local:6311 check resolvers test_resolver

Important thing here to understand how we configure our HAProxy here server-template is being used for service discovery using the DNS of K8s and you can use balance leastconn to make sure the least used server is used.

In the end a service would be required for the HAProxy

apiVersion: v1
kind: Service
metadata:
  name: haproxy-service
  namespace: dev
spec:
  selector:
    app: haproxy-headless
  ports:
    - name: be
      protocol: TCP
      port: 6311
      targetPort: 8085
    - name: mngmnt
      protocol: TCP
      port: 8888
      targetPort: 8888
  type: <Type as per requirement>

Now use the above service whenever you want to access the app. Maybe you can read about how to use HAProxy in k8s for load balancing it basically doing service discovery + load balancing.

Maybe a few configurations shown in the yaml are not correct or mismatched but i hope concept is clear about how to achieve this.

Upvotes: 1

anemyte
anemyte

Reputation: 20306

I think you can make use of readiness probes:

Sometimes, applications are temporarily unable to serve traffic. For example, an application might need to load large data or configuration files during startup, or depend on external services after startup. In such cases, you don't want to kill the application, but you don't want to send it requests either. Kubernetes provides readiness probes to detect and mitigate these situations. A pod with containers reporting that they are not ready does not receive traffic through Kubernetes Services.

You can make the application to respond to probe requests with non-200 return code. It will be noted and no new requests will pass in until readiness probe succeed again. There are downsides though:

  • when all pods are busy you'll get a 502 error;
  • users will not be able to submit subsequent requests to their pod (because the pod will be busy);
  • changing readiness status take some time so if you receive a lot of requests (more than the number of pods) during a short interval (probe interval), some pods may take more than one request.

Upvotes: 1

Related Questions