Johnny5
Johnny5

Reputation: 11

Variable number of input artifacts into a step

I have a diamond style workflow where a single step A starts a variable number of analysis jobs B to X using withParam:. The number of jobs is based on dynamic information and unknown until the first step runs. This all works well, except that I also want a single aggregator job Y to run over the output of all of those analysis jobs:

     B  
    / \
   / C \
  / / \ \
 A-->D-->Y
  \  .  /
   \ . /
    \./
     X

Each of the analysis jobs B-X writes artifacts, and Y needs as input all of them. I can't figure out how to specify the input for Y. Is this possible? I've tried passing in a JSON array of the artifact keys, but the pod gets stuck on pod initialisation. I can't find any examples on how to do this.

A creates several artifacts which are consumed by B-X (one per job as part of the withParam:) so I know my artifact repository is set up correctly.

Each of the jobs B-X require a lot of CPU so will be running on different nodes, so I don't think a shared volume will work (although I don't know much about sharing volumes across different nodes).

Upvotes: 0

Views: 1337

Answers (1)

Johnny5
Johnny5

Reputation: 11

I posted the question as a GitHub issue:

https://github.com/argoproj/argo/issues/4120

The solution is to write all the output to an artifact path specific to the job (i.e. the same subdirectory). You then specify that path as the input key and argo will unpack all the previous results into a subdirectory. You can use {{workflow.name}} to create unique paths.

This does mean you're restricted to a specific directory structure on your artifact repository, but for me that was a small price to pay.

For a full working solution see sarabala1979's answer on the GitHub issue.

Upvotes: 1

Related Questions