Lei Yang
Lei Yang

Reputation: 4365

Is it possible, and how to limit kubernetes job to create a maxium number of pods if always fail?

As a QA in our company I am daily user of kubernetes, and we use kubernetes job to create performance tests pods. One advantage of job, according to the docs, is

to create one Job object in order to reliably run one Pod to completion

But in our tests this feature will create infinite pods if previous ones fail, which will occupy resources of our team's shared cluster, and deleting such pods will take a lot of time. see this image: enter image description here

Currently the job manifest is like this:

   {
  "apiVersion": "batch/v1",
  "kind": "Job",
  "metadata": {
    "name": "upgradeperf",
    "namespace": "ntg6-grpc26-tts"
  },
  "spec": {
    "template": {
      "spec": {
        "containers": [
          {
            "name": "upgradeperfjob",
            "image":
"mycompany.com:5000/ncs-cd-qa/upgradeperf:0.1.1",
            "command": [
              "python",
              "/jmeterwork/jmeter.py",
              "-gu",
              "[email protected]:mobility-ncs-tools/tts-cdqa-tool.git",
              "-gb",
              "upgradeperf",
          "-t",
              "JMeter/testcases/ttssvc/JMeterTestPlan_ttssvc_cmpsize.jmx",
          "-JtestDataFile",
              "JMeter/testcases/ttssvc/testData/avaml_opus.csv",
          "-JthreadNum",
              "3",
          "-JthreadLoopCount",
              "1500",
          "-JresultsFile",
              "results_upgradeperf_cavaml_opus_t3_l1500.csv",
          "-Jhost",
          "mtl-blade32-03.mycompany.com",
          "-Jport",
          "28416"
            ]
          }
        ],
        "restartPolicy": "Never",
        "imagePullSecrets": [
          {
            "name": "docker-registry-secret"
          }
        ]
      }
    }
  }
}

In some cases, such as misconfiguring of ip/ports, 'reliably run one Pod to completion' is impossible and recreating pods is waste of time and resource. So is it possible, and how to limit kubernetes job to create a maxium number(say 3) of pods if always fail?

Upvotes: 3

Views: 2052

Answers (2)

Kun Li
Kun Li

Reputation: 2755

Depending on your kubernetes version, you can resolve this problem with these methods:

  1. set the option: restartPolicy: OnFailure, then the failed container will be restarted in the same Pod, so you will not get lots of failed Pods, instead you will get a Pod with lots of restart.

  2. From Kubernetes 1.8 on, There is a parameter backoffLimit to control the restart policy of failed job. This parameter defines the retry times of the job before treating the job to be failed, default 6 times. For this parameter to work you must set the parameter restartPolicy: Never .

Upvotes: 6

coderanger
coderanger

Reputation: 54267

You probably didn't set restartPolicy: Never in your pod spec, add that and I would expect it matches your expected behaviors better.

Upvotes: 1

Related Questions