Reputation: 2255
So I'm working on script to search through tar files for specific strings- basically zgrep. For some reason, though it freezes up on much larger files...
Any ideas?
#!/bin/bash
tarname=$1
pattern=$2
max=$3
count=1
tar -tf $tarname | while read -r FILE
do
tar -xf $tarname $FILE
count=$(expr $count + 1)
if [ "$count" == "$max" ]; then
rm $FILE
break
fi
if grep $pattern $FILE; then
echo "found pattern in :" $FILE
mv $FILE stringfind
else
rm $FILE
fi
done
if [ $(ls stringfind | wc -l) -eq 0 ]; then
echo "File Not Found"
fi
I need it done this way to reduce spatial limitations- but why exactly is it not going through to other files? i did a loop print out test and it only looped once or twice before stopping...
So it's reading through the entire tar file every time i call "read"? As in- if a tar has 100 files, it's reading 100x100 = 10,000 times?
Upvotes: 1
Views: 82
Reputation: 189678
You keep on opening and closing the tarfile, reading it from the beginning each time. It would be much more economical to just extract all the files in one go, if you can.
If you can't, moving to a language with library support for tar
files would be my suggestion. https://docs.python.org/2/library/tarfile.html looks like what you need should be doable in just a few lines of Python.
Upvotes: 2
Reputation: 107080
You are reading in each file from the command line, then running tar -xf
on that file multiple times. This is fairly inefficient. Just extract the whole tarball, then use grep -l -R
(which works on most systems) to search for the files that contain the strings. The -l
means list the file name and don't give me the line in the file that contains the regex.
Why on small ones and not large ones? Could be this logic:
if [ "$count" == "$max" ]; then
rm $FILE
break
fi
You're counting the number of times you're in the loop, and break when you hit max
. If max
is 100, this will fail on tar balls that contain 1000 files and the string is in the 200th file.
Upvotes: 1
Reputation: 96
Upvotes: 1