Reputation: 727
I'm trying to parse a number of log files from a log directory, to search for any number of strings in a list along with a server name. I feel like I've tried a million different options, and I have it working fine with just one log file.. but when I try to go through all the log files in the directory I can't seem to get anywhere.
if args.f:
logs = args.f
else:
try:
logs = glob("/var/opt/cray/log/p0-current/*")
except IndexError:
print "Something is wrong. p0-current is not available."
sys.exit(1)
valid_errors = ["error", "nmi", "CATERR"]
logList = []
for log in logs:
logList.append(log)
#theLog = open("logList")
#logFile = log.readlines()
#logFile.close()
#printList = []
#for line in logFile:
# if (valid_errors in line):
# printList.append(line)
#
#for item in printList:
# print item
# with open("log", "r") as tmp_log:
# open_log = tmp_log.readlines()
# for line in open_log:
# for down_nodes in open_log:
# if valid_errors in open_log:
# print valid_errors
down_nodes
is a pre-filled list further up the script containing a list of servers which are marked as down.
Commented out are some of the various attempts I've been working through.
logList = []
for log in logs:
logList.append(log)
I thought this may be the way forward to put each individual log file in a list, then loop through this list and use open()
followed by readlines()
but I'm missing some kind of logic here.. maybe I'm not thinking correctly.
I could really do with some pointers here please.
Thanks.
Upvotes: 3
Views: 2344
Reputation: 10090
So your last for loop is redundant because logs
is already a list of strings. With that information, we can iterate through logs
and do something for each log
.
for log in logs:
with open(log) as f:
for line in f.readlines():
if any(error in line for error in valid_errors):
#do stuff
The line if any(error in line for error in valid_errors):
checks the line
to see if any of the errors in valid_errors
are in the line. The syntax is a generator that yields error
for each error
in valid_errors
.
To answer your question involving down_nodes
, I don't believe you should include this in the same any()
. You should try something like
if any(error in line for error in valid_errors) and \
any(node in line for node in down_nodes):
Upvotes: 1
Reputation: 1735
Firstly you need to find all logs:
import os
import fnmatch
def find_files(pattern, top_level_dir):
for path, dirlist, filelist in os.walk(top_level_dir):
for name in fnmatch.filter(filelist, pattern)
yield os.path.join(path, name)
For example, to find all *.txt
files in current dir:
txtfiles = find_files('*.txt', '.')
Then get file objects from the names:
def open_files(filenames):
for name in filenames:
yield open(name, 'r', encoding='utf-8')
Finally individual lines from files:
def lines_from_files(files):
for f in files:
for line in f:
yield line
Since you want to find some errors the check could look like this:
import re
def find_errors(lines):
pattern = re.compile('(error|nmi|CATERR)')
for line in lines:
if pattern.search(line):
print(line)
You can now process a stream of lines generated from a given directory:
txt_file_names = find_files('*.txt', '.')
txt_files = open_files(txt_file_names)
txt_lines = lines_from_files(txt_files)
find_errors(txt_lines)
The idea to process logs as a stream of data originates from talk by David Beazley.
Upvotes: 1