basel117
basel117

Reputation: 189

best way to parallelization in shell

I have an experiment that I need to execute many times and track the execution time of it.

My python code needs to run multiple times with different inputs but also multiple times for the same input to get the average time of execution for every single input.

I am thinking of using parallelization ( and I am doing this in bash)but I don't know how to approach it because I need to keep track of everything and then represent results in a graph.

My concerns is that if I want to have everything in one file with the command parallel I will have unordered data since I can't control which job ends first. If I decide to for example have all the output of a certain input in one file (assigned to this specific input) and then get the average out of it, I will end up with many files and this might make the next step more difficult.

I am not asking for code, I just want a better idea(if possible) of an algorithm I can use. maybe a way of controling the order of jobs(FIFO) created by parallel, another tool of parallelization maybe...? Help

Upvotes: 1

Views: 54

Answers (2)

Ole Tange
Ole Tange

Reputation: 33740

Can one of these work for you?

parallel --keep-order myexperiment ::: a r g s 1 ::: a r g s 2 > output-in-order
parallel --results mydir/ myexperiment ::: a r g s 1 ::: a r g s 2
parallel --results myfile{1}-{2} myexperiment ::: a r g s 1 ::: a r g s 2
parallel --results myfile.tsv myexperiment ::: a r g s 1 ::: a r g s 2

If you are a scientist, the last one is interesting because it can be read directly by R.

Upvotes: 1

Ahmed
Ahmed

Reputation: 345

You launch all the scripts at once (for example in a loop), and each script redirects its result to a separate file

To do this use in the naming of test logs for example:

Log_file_name. $$. Log => Log_file_name.1548.log

$$: return the process number of the script (which is a single value)

I hope this can help you

Upvotes: 0

Related Questions