How to read a combination list by bash/shell?

Question

I have an A_1 folder, an A_2 folder, an A_3 folder.....until A_561 folders.

Each folder has a sub-directory with a B_1, a B_2, a B_3... and a B_34 folder.

In the B_1 folder, there are files named F_1_1.txt, F_1_2.txt... F_1_38.txt. F_2_1.txt, F_2_1.txt... F_2_38.txt.

In the B_2 folder, there are files named F_1_1.txt, F_1_2.txt... F_1_38.txt. F_2_1.txt, F_2_1.txt... F_2_38.txt.

Then I will run a java program to process these files:

java -jar beagle.28Sep18.793.jar  \
gt=/A_1/B_1/F_1_1.txt /A_1/B_1/F_2_1 out=/C/test_1.out;.....     

java -jar beagle.28Sep18.793.jar  \
gt=/A_1/B_2/F_1_2.txt /A_1/B_2/F_3_2 out=/C/test_2.out;.....    

java -jar beagle.28Sep18.793.jar  \
gt=/A_2/B_3/F_3_1.txt /A_2/B_3/F_4_1 out=/C/test_3.out;    

java -jar beagle.28Sep18.793.jar  \
gt=/A_3/B_1/F_1_38.txt /A_3/B_1/F_1_38 out=/C/test_4.out;

I can run a for loop bash to read the files by

for folder in $(seq 561); do 
    for file in $(seq 1 34); do 
        for sample in $(seq 1 38); do   
java -jar beagle.28Sep18.793.jar gt=/A_"$folder"/B_"$file"/F_"$file"_"sample".txt /A_"$folder"/B_"$file"/F_"$file"_"sample" out=/C/test_"file"_"$sample".out  
        done  
     done  
done

This command can run very slow. I know some files did not exit but the java will skip it and run next. I would like to know for this case, how can I write a command to read files correctly.

Mark Setchell · Accepted Answer

I can't be sure to have understood your question correctly because it is so poorly formatted, but I think you want to run a Java program on each text file in a folder hierarchy. I think you can do that relatively easily and fast in parallel with GNU Parallel.

So here's how to generate a list of the text files with find:

find . -name \*.txt -print

If that looks correct, you can run the same again but null-terminate each name and pass it into GNU Parallel like this:

find . -name \*.txt -print0 | parallel -0

Now you want to run a Java program for each file and use an incrementing number for the output file, so we can do a dry-run, which only prints what it would do without actually doing anything, like this:

find . -name \*.txt -print0 | parallel -0 --dry-run java -jar beagle.28Sep18.793.jar gt={} out=/C/test_{#}.out

If that looks correct, remove the --dry-run and run it again and it will run as many instances of Java in parallel as you have CPU cores and keep them all busy till the jobs are done.

How to read a combination list by bash/shell?

Answers (1)

Related Questions