abdus_salam
abdus_salam

Reputation: 788

how to untar certain files from an archive and grep in parallel in bash

We've got extensive amount of tarballs and in each tarball I need to search for a particular pattern only in some files which names are known before hand.

As the disk access is slower and there is quite a few cores and plenty of memory available on this system, we aim minimising the disk writes and going through the memory as much as possible.

echo "a.txt" > file_subset_in_tar.txt
echo "b.txt" >> file_subset_in_tar.txt
echo "c.txt" >> file_subset_in_tar.txt
tarball_name="tarball.tgz";
pattern="mypattern"
echo "pattern: $pattern"

(parallel -j-2 tar xf $tarball_name -O ::: `cat file_subset_in_tar.txt` | grep -ac "$pattern")

This works just fine on the bash terminal directly. However, when I paste this in a script with bash bang on the top, it just prints zero.

If I change the $pattern to a hard coded string, it runs ok. It feels like there is something wrong with the pipe sequencing or something similar. So, ideally an update to the attempt above or another solution which satisfies the mentioned disk/memory use requirements would be much appreciated.

Upvotes: 1

Views: 173

Answers (1)

cody
cody

Reputation: 11157

I believe your parallel command is constructed incorrectly. You can run the pipeline of commands like the following:

parallel -j -2 "tar xf $tarball_name -O {} | grep -ac $pattern" :::: file_subset_in_tar.txt

Also note that the backticks and use of cat is unnecessary, parameters can be fed to parallel from a file using ::::.

Upvotes: 1

Related Questions