Reputation: 788
We've got extensive amount of tarballs and in each tarball I need to search for a particular pattern only in some files which names are known before hand.
As the disk access is slower and there is quite a few cores and plenty of memory available on this system, we aim minimising the disk writes and going through the memory as much as possible.
echo "a.txt" > file_subset_in_tar.txt
echo "b.txt" >> file_subset_in_tar.txt
echo "c.txt" >> file_subset_in_tar.txt
tarball_name="tarball.tgz";
pattern="mypattern"
echo "pattern: $pattern"
(parallel -j-2 tar xf $tarball_name -O ::: `cat file_subset_in_tar.txt` | grep -ac "$pattern")
This works just fine on the bash terminal directly. However, when I paste this in a script with bash
bang on the top, it just prints zero.
If I change the $pattern
to a hard coded string, it runs ok. It feels like there is something wrong with the pipe sequencing or something similar. So, ideally an update to the attempt above or another solution which satisfies the mentioned disk/memory use requirements would be much appreciated.
Upvotes: 1
Views: 173
Reputation: 11157
I believe your parallel
command is constructed incorrectly. You can run the pipeline of commands like the following:
parallel -j -2 "tar xf $tarball_name -O {} | grep -ac $pattern" :::: file_subset_in_tar.txt
Also note that the backticks and use of cat
is unnecessary, parameters can be fed to parallel
from a file using ::::
.
Upvotes: 1