Tran B. V. Son
Tran B. V. Son

Reputation: 839

Kubernetes cron job oomkilled

I have a rails app that is deployed on K8S. Inside my web app, there is a cronjob thats running every day at 8pm and it takes 6 hours to finish. I noticed OOMkilled error occurs after a few hours from cronjob started. I also increased memory of a pod but the error still happened.

This is my yaml file:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: sync-data
spec:
  schedule: "0 20 * * *" # At 20:00:00pm every day
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 5
  jobTemplate:
    spec:
      ttlSecondsAfterFinished: 100
      template:
        spec:
          serviceAccountName: sync-data
          containers:
            - name: sync-data
              resources:
                requests:
                  memory: 2024Mi # OOMKilled
                  cpu: 1000m
                limits:
                  memory: 2024Mi # OOMKilled
                  cpu: 1000m
              image: xxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/path
              imagePullPolicy: IfNotPresent
              command:
                - "/bin/sh"
                - "-c"
                - |
                  rake xxx:yyyy # Will take ~6 hours to finish
          restartPolicy: Never 

Are there any best practices to run long consuming cronjob on K8S? Any help is welcome!

Upvotes: 1

Views: 3500

Answers (2)

Saurabh Nigam
Saurabh Nigam

Reputation: 813

OOM Killed can happen for 2 reasons.

  1. Your pod is taking more memory than the limit specified. In that case, you need to increase the limit obviously.

  2. If all the pods in the node are taking more memory than they have requested then Kubernetes will kill some pods to free up space. In that case, you can give higher priority to this pod.

You should have monitoring in place to actually determine the reasons for this. Proper monitoring will show you which pods are performing as per expectations and which are not. You could also use node selectors for long-running pods and set priority class which will remove non-cron pods first.

Upvotes: 3

Dashrath Mundkar
Dashrath Mundkar

Reputation: 9184

Well honestly there is no correct resources request/limit stuff in kubernetes because it totally depend on your pod what kind of stuff it is doing. One thing I would suggest or you can do is deploy the vertical pod auto-scaling and observe what the vertical pod autoscaler suggest you the perfect resource request/limits for your cron job. Here is the very nice article you can start with and you will get to know how you can utilize this in your requirement.

https://medium.com/infrastructure-adventures/vertical-pod-autoscaler-deep-dive-limitations-and-real-world-examples-9195f8422724

Upvotes: 0

Related Questions