Reputation: 157
I have an A_1 folder, an A_2 folder, an A_3 folder.....until A_561 folders.
Each folder has a sub-directory with a B_1, a B_2, a B_3... and a B_34 folder.
In the B_1 folder, there are files named F_1_1.txt, F_1_2.txt... F_1_38.txt. F_2_1.txt, F_2_1.txt... F_2_38.txt.
In the B_2 folder, there are files named F_1_1.txt, F_1_2.txt... F_1_38.txt. F_2_1.txt, F_2_1.txt... F_2_38.txt.
In the B_2 folder, there are files named F_1_1.txt, F_1_2.txt... F_1_38.txt. F_2_1.txt, F_2_1.txt... F_2_38.txt.
Then I will run a java
program to process these files:
java -jar beagle.28Sep18.793.jar \
gt=/A_1/B_1/F_1_1.txt /A_1/B_1/F_2_1 out=/C/test_1.out;.....
java -jar beagle.28Sep18.793.jar \
gt=/A_1/B_2/F_1_2.txt /A_1/B_2/F_3_2 out=/C/test_2.out;.....
java -jar beagle.28Sep18.793.jar \
gt=/A_2/B_3/F_3_1.txt /A_2/B_3/F_4_1 out=/C/test_3.out;
java -jar beagle.28Sep18.793.jar \
gt=/A_3/B_1/F_1_38.txt /A_3/B_1/F_1_38 out=/C/test_4.out;
I can run a for
loop bash to read the files by
for folder in $(seq 561); do
for file in $(seq 1 34); do
for sample in $(seq 1 38); do
java -jar beagle.28Sep18.793.jar gt=/A_"$folder"/B_"$file"/F_"$file"_"sample".txt /A_"$folder"/B_"$file"/F_"$file"_"sample" out=/C/test_"file"_"$sample".out
done
done
done
This command can run very slow. I know some files did not exit but the java
will skip it and run next. I would like to know for this case, how can I write a command to read files correctly.
Upvotes: 2
Views: 319
Reputation: 207465
I can't be sure to have understood your question correctly because it is so poorly formatted, but I think you want to run a Java program on each text file in a folder hierarchy. I think you can do that relatively easily and fast in parallel with GNU Parallel.
So here's how to generate a list of the text files with find
:
find . -name \*.txt -print
If that looks correct, you can run the same again but null-terminate each name and pass it into GNU Parallel like this:
find . -name \*.txt -print0 | parallel -0
Now you want to run a Java program for each file and use an incrementing number for the output file, so we can do a dry-run, which only prints what it would do without actually doing anything, like this:
find . -name \*.txt -print0 | parallel -0 --dry-run java -jar beagle.28Sep18.793.jar gt={} out=/C/test_{#}.out
If that looks correct, remove the --dry-run
and run it again and it will run as many instances of Java in parallel as you have CPU cores and keep them all busy till the jobs are done.
Upvotes: 2