Multiple Requests Causing Pod to Crash due to OOM - isn't this the job of a load balancer?

Question

I have 2-3 machine learning models I am trying to host via Kubernetes. I don't get much usage on the models right now, but they are critical and need to be available when called upon.

I am providing access to the models via a flask app and am using a load balancer to route traffic to the flask app.

Everything typically works fine since requests are only made intermittently, but I've come to find that if multiple requests are made at the same time my pod crashes due to OOM. Isn't this the job of the load balancer? To make sure requests are routed appropriately? (in this case, route the next request after the previous ones are complete?)

Below is my deployment:

apiVersion: v1
kind: Service
metadata:
  name: flask-service
  labels:
    run: flask-service
spec:
  selector:
    app: flask
  ports:
  - protocol: "TCP"
    port: 5000
    targetPort: 5000
  type: LoadBalancer
---  
apiVersion: apps/v1
kind: Deployment
metadata:
  name: flask
spec:
  selector:
    matchLabels:
      app: flask  
  replicas: 1
  template:
    metadata:
      labels:
        app: flask
    spec:
      containers:
      - name: flask
        imagePullPolicy: Always
        image: gcr.io/XXX/flask:latest
        ports:
        - containerPort: 5000
        resources:
          limits:
            memory: 7000Mi
          requests:
            memory: 1000Mi

Multiple Requests Causing Pod to Crash due to OOM - isn't this the job of a load balancer?

Answers (1)

Horizontal Pod Autoscaler

Use a queue

Related Questions

Multiple Requests Causing Pod to Crash due to OOM - isn&#39;t this the job of a load balancer?

Answers (1)

Horizontal Pod Autoscaler

Use a queue

Related Questions

Multiple Requests Causing Pod to Crash due to OOM - isn't this the job of a load balancer?