Reputation: 1086
I have several .tgz log files each containing few hundred to thousand lines. I also have a list of error strings. I have to read each and every log file inside the zip file and check whether any of the error strings is present in that file. I also need to get the name of the file in which the error pattern was found.
errorList = ["errorPattern1", "errorPattern2",..., "errorPatternN"]
Which is the fastest way to do it in Python?
Upvotes: 0
Views: 1561
Reputation: 4418
Nested loops iterating over the '.tgz' files in the directory and over the items in each tarfile. Read the text of the entire file object at once. Then check if any of the error patterns are in the text.
Something like this:
import glob, tarfile
for fname in glob.iglob('*.tgz'):
with tarfile.open('filename', 'rb') as tar:
for info in iter(tar.next, None):
text = tar.extractfile(info).read()
if any(msg in text for msg in error_list):
print "an error message was found in: ", info.name
Upvotes: 2