Reputation: 658
I am filtering very long text files in Linux (usually > 1GB) to get only those lines I am interested in. I use with this command:
cat ./my/file.txt | LC_ALL=C fgrep -f ./my/patterns.txt | $decoder > ./path/to/result.txt
$decoder
is the path to a program I was given to decode these files. The problem now is that it only accept lines with 7 fields, this is, 7 strings separated by spaces (e.g. "11 22 33 44 55 66 77"). Whenever a string with more or less fields is passed into this program makes it crash, and I get a broken pipe error message.
To fix it, I wrote a super simple script in Bash
:
while read line ; do
if [[ $( echo $line | awk '{ print NF }') == 7 ]]; then
echo $line;
fi;
done
But the problem is that now it take ages to finish. Before it took seconds and now it takes ~30 minutes.
Does anyone know a better/faster way to do this? Thank you in advance.
Upvotes: 2
Views: 336
Reputation: 75568
Well perhaps you can insert awk
between instead. No need to rely on Bash:
LC_ALL=C fgrep -f ./my/patterns.txt ./my/file.txt | awk 'NF == 7' | "$decoder" > ./path/to/result.txt
Perhaps awk
can be the starter. Performance may be better that way:
awk 'NF == 7' ./my/file.txt | LC_ALL=C fgrep -f ./my/patterns.txt | "$decoder" > ./path/to/result.txt
You can merge fgrep
and awk
as a single awk
command however I'm not sure if that would affect anything that require LC_ALL=C
and that it would give better performance.
Upvotes: 2