Sebin
Sebin

Reputation: 84

How to find number of retries in Argo workflow

I'm having an Argo workflow with

retryStrategy:
  limit: "2"
  retryPolicy: Always

If my workflow fails, it will be retried again 2 times, so is there any way by which I can know how many times my workflow was retried before getting passed.

Upvotes: 0

Views: 1254

Answers (1)

crenshaw-dev
crenshaw-dev

Reputation: 8412

is there any way by which I can know how many times my workflow was retried before getting passed.

retryStrategy doesn't retry Worfklows. It retries tasks/steps within Workflows. If a step/task within a workflow fails, it will be retried according to the template's retryStrategy (or the default retryStrategy defined at the Workflow level if none is defined at the template level).

There are many ways to check retry counts. Here are a few:

  1. Argo CLI

    argo submit https://raw.githubusercontent.com/argoproj/argo-workflows/master/examples/retry-on-error.yaml -n argo
    argo get @latest -n argo
    
    Name:                retry-on-error-6dmcq
    Namespace:           argo
    ServiceAccount:      default
    Status:              Succeeded
    Conditions:
      PodRunning          False
      Completed           True
    Created:             Fri Dec 10 09:42:36 -0500 (1 minute ago)
    Started:             Fri Dec 10 09:42:36 -0500 (1 minute ago)
    Finished:            Fri Dec 10 09:42:56 -0500 (1 minute ago)
    Duration:            20 seconds
    Progress:            2/2
    ResourcesDuration:   4s*(1 cpu),4s*(100Mi memory)
    
      STEP                          TEMPLATE         PODNAME                          DURATION  MESSAGE
      ✔ retry-on-error-6dmcq       error-container
      ├─✖ retry-on-error-6dmcq(0)  error-container  retry-on-error-6dmcq-1358267989  3s        Error (exit code 3)
      └─✔ retry-on-error-6dmcq(1)  error-container  retry-on-error-6dmcq-1157083656  3s
    

    The step failed once and then succeeded.

  2. Argo UI

    The Argo web UI has a similar presentation. The graph will branch, and failed attempts will appear as leaf nodes marked as failed.

  3. Other

    You could inspect the Workflow object and analyze the nodes to find failures associated with retries.

    kubectl get wf retry-on-error-6dmcq -n argo -ojson
    

    You could also look at app logs for repetition from the step in question.

Upvotes: 1

Related Questions