Sandeep
Sandeep

Reputation: 1

Kubernetes design app for highly scalable

I am building an API application in Python that will be used by large volume of users to perform their job request. The user uses API endpoint to submit their job request providing input values. They receive the JobID in response and the application runs the job in background. The user can poll and retrieve the result once the job completes.

Initial version of this application defines the fixed number of replicas and it supports number of multiple requests. However, as I increased the load, either the requests start getting fail or the application (gunicorn worker node) under the pod start failing.

I was thinking of creating pods dynamically for background job which I have never done in the past. I think it will help in scaling the application load as the pod should die once the input job finishes. However, I am not sure about how I will manage the pods (if the pod fail or container under the pod fails etc.). In addition, I will also be limit by number of pods I can create.

Can someone share their experience if they have built similar application which demands by large volume of requests? How to design such application as highly scalable and resilient?

Upvotes: 0

Views: 124

Answers (2)

derekleeth
derekleeth

Reputation: 31

Just in case others comes across this post, my challenge I ran into was what David mentioned and was how to automatically scale the backend pods that are the workers for Python RQ. I ended up creating a new container/deployment that leveraged the Python Schedule module and ran every couple minutes checking the running jobs on a couple of queues I wanted to scale up and down based on active jobs. I was hitting a wall on finding any info on how to do this so just started playing around with the code and testing different methods. Please note that you can ignore the excessive logging in the code. Its just my experience with writing any enterprise application, tons of logging is the best way to help quickly troubleshoot issues when they come up. Here's the code I had created to solve this.

#! /usr/bin/env python

import os
import logging
import logging.handlers
import time
import schedule
import requests
from kubernetes import client, config
from redis import Redis
from rq import Queue

script_name = "auto-scaler"

if os.path.exists("/var/log/"):
    log_file = f"/var/log/{script_name}.log"
else:
    log_file = f"{script_name}.log"

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - [%(name)s-%(module)s-%(funcName)s-%(lineno)s] - %(levelname)s - %(message)s',
    handlers=[
        logging.handlers.RotatingFileHandler(log_file, maxBytes=5000000, backupCount=5),
        logging.StreamHandler()
    ]
)

redis_conn = Redis(host='automation-broker-redis', port=6380, db=0)
logging.info(f"Connected to the redis container with the default endpoint.")

q2_general = Queue('general', connection=redis_conn)
q3_initial = Queue('initial', connection=redis_conn)

def main():
    logging.info('Starting main method.')
        
    scale_deployment("automation-broker-worker-general", "automation-broker", q2_general, 5, 30)
    scale_deployment("automation-broker-worker-initial", "automation-broker", q3_initial, 10, 30)


def scale_deployment(deployment_name: str, namespace: str, queue: Queue, min_replicas: int, max_replicas: int):
    logging.info(f"Scaling deployment {deployment_name} in namespace {namespace}.")
    config.load_incluster_config()
    v1apps = client.AppsV1Api()

    ret = v1apps.read_namespaced_deployment(name=deployment_name, namespace=namespace)
    logging.info(f"Total replicas for {deployment_name} deployment: {ret.spec.replicas}")
    logging.info(f"Available replicas for {deployment_name} deployment: {ret.status.available_replicas}")
    total_replicas = ret.spec.replicas
    available_replicas = ret.status.available_replicas

    logging.info(f"Minimum replicas for {deployment_name} deployment: {min_replicas}")
    logging.info(f"Maximum replicas for {deployment_name} deployment: {max_replicas}")

    total_jobs = len(queue) + queue.started_job_registry.count + queue.scheduled_job_registry.count
    logging.info(f"Total jobs in queue for {deployment_name} deployment: {total_jobs}")

    if total_jobs > total_replicas and total_replicas < max_replicas:
        logging.info(f"Scaling up {deployment_name} deployment.")
        ret.spec.replicas = ret.spec.replicas + 1
        v1apps.patch_namespaced_deployment(name=deployment_name, namespace=namespace, body=ret)

    if total_jobs == 0 and ret.spec.replicas > min_replicas:
        logging.info(f"Scaling down {deployment_name} deployment.")
        ret.spec.replicas = ret.spec.replicas - 1
        v1apps.patch_namespaced_deployment(name=deployment_name, namespace=namespace, body=ret)

    logging.info(f"Completed scaling deployment {deployment_name} in namespace {namespace}.")

if __name__ == '__main__':
    schedule.every(2).minutes.do(main)

    logging.info("Starting scheduler service...")

    while True:
       schedule.run_pending()
       time.sleep(1)

Upvotes: 0

David Maze
David Maze

Reputation: 159752

Use a dedicated work queue for the background jobs. In a Python-native context, Celery will work fine; if you think you may need to interoperate across languages, RabbitMQ is a popular open-source option; there are other choices too.

Similarly, write a dedicated worker process to actually run the jobs.

Your basic execution flow will work like this:

  1. The API endpoint adds a job to the work queue, but doesn't actually do any of the work itself.
  2. The worker does tasks from the work queue, and writes their results back to the database.
  3. The status endpoint polls the database to return the results.

Develop and test this locally without using any container infrastructure at all (maybe launch the database and job queue infrastructure using Docker Compose, but actually write the API server and worker outside containers).

When you go to deploy this, you will need two StatefulSets (for the database and for the Redis/RabbitMQ/...) and two Deployments (for the HTTP server and for the worker).

The actual point of this is to be able to configure a HorizontalPodAutoscaler for the two Deployments. This lets you automatically set the replicas: based on some metric; there is a challenge of connecting metrics from your application to some metric store (often Prometheus) back to the HPA infrastructure. Now you can set the number of replicas for the API server proportional to the number of concurrent HTTP requests (scaling up the HTTP tier under load), and the number of replicas for the worker proportional to the queue length (scaling up workers if there are lots of incomplete jobs).

Note that none of this uses the Kubernetes API at all, and it does not attempt to dynamically create Pods. As noted above, this stack is pretty independent of any container runtime (you could run it in plain Docker, or without containers at all). There are also challenges around permissions and in cleaning up a per-job Pod once it's finished. I'd avoid this approach.

Upvotes: 0

Related Questions