Reputation: 11
I have a diamond style workflow where a single step A
starts a variable number of analysis jobs B
to X
using withParam:
. The number of jobs is based on dynamic information and unknown until the first step runs. This all works well, except that I also want a single aggregator job Y
to run over the output of all of those analysis jobs:
B
/ \
/ C \
/ / \ \
A-->D-->Y
\ . /
\ . /
\./
X
Each of the analysis jobs B
-X
writes artifacts, and Y
needs as input all of them. I can't figure out how to specify the input for Y
. Is this possible? I've tried passing in a JSON array of the artifact key
s, but the pod gets stuck on pod initialisation. I can't find any examples on how to do this.
A
creates several artifacts which are consumed by B
-X
(one per job as part of the withParam:
) so I know my artifact repository is set up correctly.
Each of the jobs B
-X
require a lot of CPU so will be running on different nodes, so I don't think a shared volume will work (although I don't know much about sharing volumes across different nodes).
Upvotes: 0
Views: 1337
Reputation: 11
I posted the question as a GitHub issue:
https://github.com/argoproj/argo/issues/4120
The solution is to write all the output to an artifact path specific to the job (i.e. the same subdirectory). You then specify that path as the input key
and argo will unpack all the previous results into a subdirectory. You can use {{workflow.name}}
to create unique paths.
This does mean you're restricted to a specific directory structure on your artifact repository, but for me that was a small price to pay.
For a full working solution see sarabala1979's answer on the GitHub issue.
Upvotes: 1