best way to parallelization in shell

Question

I have an experiment that I need to execute many times and track the execution time of it.

My python code needs to run multiple times with different inputs but also multiple times for the same input to get the average time of execution for every single input.

I am thinking of using parallelization ( and I am doing this in bash)but I don't know how to approach it because I need to keep track of everything and then represent results in a graph.

My concerns is that if I want to have everything in one file with the command parallel I will have unordered data since I can't control which job ends first. If I decide to for example have all the output of a certain input in one file (assigned to this specific input) and then get the average out of it, I will end up with many files and this might make the next step more difficult.

I am not asking for code, I just want a better idea(if possible) of an algorithm I can use. maybe a way of controling the order of jobs(FIFO) created by parallel, another tool of parallelization maybe...? Help

Ole Tange · Accepted Answer

Can one of these work for you?

parallel --keep-order myexperiment ::: a r g s 1 ::: a r g s 2 > output-in-order
parallel --results mydir/ myexperiment ::: a r g s 1 ::: a r g s 2
parallel --results myfile{1}-{2} myexperiment ::: a r g s 1 ::: a r g s 2
parallel --results myfile.tsv myexperiment ::: a r g s 1 ::: a r g s 2

If you are a scientist, the last one is interesting because it can be read directly by R.

best way to parallelization in shell

Answers (2)

Related Questions