echan00
echan00

Reputation: 2807

Multiple Requests Causing Pod to Crash due to OOM - isn't this the job of a load balancer?

I have 2-3 machine learning models I am trying to host via Kubernetes. I don't get much usage on the models right now, but they are critical and need to be available when called upon.

I am providing access to the models via a flask app and am using a load balancer to route traffic to the flask app.

Everything typically works fine since requests are only made intermittently, but I've come to find that if multiple requests are made at the same time my pod crashes due to OOM. Isn't this the job of the load balancer? To make sure requests are routed appropriately? (in this case, route the next request after the previous ones are complete?)

Below is my deployment:

apiVersion: v1
kind: Service
metadata:
  name: flask-service
  labels:
    run: flask-service
spec:
  selector:
    app: flask
  ports:
  - protocol: "TCP"
    port: 5000
    targetPort: 5000
  type: LoadBalancer
---  
apiVersion: apps/v1
kind: Deployment
metadata:
  name: flask
spec:
  selector:
    matchLabels:
      app: flask  
  replicas: 1
  template:
    metadata:
      labels:
        app: flask
    spec:
      containers:
      - name: flask
        imagePullPolicy: Always
        image: gcr.io/XXX/flask:latest
        ports:
        - containerPort: 5000
        resources:
          limits:
            memory: 7000Mi
          requests:
            memory: 1000Mi

Upvotes: 0

Views: 850

Answers (1)

Jonas
Jonas

Reputation: 129065

Isn't this the job of the load balancer? To make sure requests are routed appropriately?

Yes, you are right. But...

replicas: 1

You only use a single replica, so the load balancer has no options to route to other instances of your application. Give it multiple instances.

I've come to find that if multiple requests are made at the same time my pod crashes due to OOM

It sounds like your application has very limited resources.

    resources:
      limits:
        memory: 7000Mi
      requests:
        memory: 1000Mi

When your application uses more than 7000Mi it will get OOM-killed (also consider increase request value). If your app need more, you can give it more memory (scale vertically) or add more instances (scale horizontally).

Horizontal Pod Autoscaler

Everything typically works fine since requests are only made intermittently

Consider using Horizontal Pod Autoscaler, it can scale up your application to more instances when you have more requests and scale down when there is less requests. This can be based on memory or CPU usage for example.

Use a queue

route the next request after the previous ones are complete?

If this is the behavior you want, then you need to use a queue e.g. RabbitMQ or Kafka to process your requests one at a time.

Upvotes: 1

Related Questions