GNU Parallel: distribute files from one source to remote hosts while distributing destination files

Question

Scenario: S3 bucket has 1000 files. I have two machines. Each of these machines has two drives /dev/sda and /dev/sdb. Constraints: no one single drive can fit all 1000 files. And no one machine can fit all 1000 files. Desired outcome: distribute 1000 files across 4 drives on two machines using GNU parallel.

I tried things like:

parallel --xapply --joblog out.txt -S:,R echo {1} {2} ::: "/dev/sda" "/dev/sdb" ::: {0..10}

But I get:

Seq Host    Starttime   JobRuntime  Send    Receive Exitval Signal  Command  
2   :   1414040436.607       0.037  0   0   0   0   echo /dev/sda 1
4   :   1414040436.615       0.030  0   0   0   0   echo /dev/sda 3
6   :   1414040436.623       0.024  0   0   0   0   echo /dev/sda 5
8   :   1414040436.632       0.015  0   0   0   0   echo /dev/sda 7
10  :   1414040436.640       0.006  0   0   0   0   echo /dev/sda 9
1   R   1414040436.603       0.088  0   0   0   0   echo /dev/sdb 0
3   R   1414040436.611       0.092  0   0   0   0   echo /dev/sdb 2
5   R   1414040436.619       0.095  0   0   0   0   echo /dev/sdb 4
7   R   1414040436.628       0.095  0   0   0   0   echo /dev/sdb 6
9   R   1414040436.636       0.096  0   0   0   0   echo /dev/sdb 8
11  R   1414040436.645       0.094  0   0   0   0   echo /dev/sdb 10

Where 'R' is remote host IP. How do I distribute files (I have all names in a file) from S3 to 4 drives? Thank you.

Ole Tange · Accepted Answer

GNU Parallel is good for starting a new job when an old has finished: It divides the jobs into servers on the fly and not beforehand.

What you are looking for is a way to do this beforehand.

Your --xapply approach seems sound, but you need to force GNU Parallel to distribute evenly to the hosts. Your current approach is dependent on how fast each host finishes, and that will not work in general.

So something like:

parallel echo {1}//{2} ::: sda sdb ::: server1 server2 | parallel --colsep '//' --xapply echo copy {3} to {1} on {2} :::: - filenames.txt

Or:

parallel --xapply echo copy {3} to {1} on {2} ::: sda sda sdb sdb ::: server1 server2 server1 server2 :::: filenames.txt

GNU Parallel: distribute files from one source to remote hosts while distributing destination files

Answers (1)

Related Questions