jonnybinthemix
jonnybinthemix

Reputation: 727

Parse multiple log files for strings

I'm trying to parse a number of log files from a log directory, to search for any number of strings in a list along with a server name. I feel like I've tried a million different options, and I have it working fine with just one log file.. but when I try to go through all the log files in the directory I can't seem to get anywhere.

if args.f:
    logs = args.f
else:
    try:
        logs = glob("/var/opt/cray/log/p0-current/*")
    except IndexError:
        print "Something is wrong. p0-current is not available."
        sys.exit(1)

valid_errors = ["error", "nmi", "CATERR"]

logList = []
for log in logs:
    logList.append(log)



#theLog = open("logList")
#logFile = log.readlines()
#logFile.close()
#printList = []

#for line in logFile:
#    if (valid_errors in line):
#        printList.append(line)
#
#for item in printList:
#    print item


#    with open("log", "r") as tmp_log:

#       open_log = tmp_log.readlines()
#           for line in open_log:
#               for down_nodes in open_log:
#                   if valid_errors in open_log:
#                       print valid_errors

down_nodes is a pre-filled list further up the script containing a list of servers which are marked as down.

Commented out are some of the various attempts I've been working through.

logList = []
for log in logs:
    logList.append(log)

I thought this may be the way forward to put each individual log file in a list, then loop through this list and use open() followed by readlines() but I'm missing some kind of logic here.. maybe I'm not thinking correctly.

I could really do with some pointers here please.

Thanks.

Upvotes: 3

Views: 2344

Answers (2)

wpercy
wpercy

Reputation: 10090

So your last for loop is redundant because logs is already a list of strings. With that information, we can iterate through logs and do something for each log.

for log in logs:
    with open(log) as f:
        for line in f.readlines():
            if any(error in line for error in valid_errors):
                #do stuff

The line if any(error in line for error in valid_errors): checks the line to see if any of the errors in valid_errors are in the line. The syntax is a generator that yields error for each error in valid_errors.

To answer your question involving down_nodes, I don't believe you should include this in the same any(). You should try something like

if any(error in line for error in valid_errors) and \
    any(node in line for node in down_nodes):

Upvotes: 1

JanHak
JanHak

Reputation: 1735

Firstly you need to find all logs:

import os
import fnmatch

def find_files(pattern, top_level_dir):
    for path, dirlist, filelist in os.walk(top_level_dir):
        for name in fnmatch.filter(filelist, pattern)
            yield os.path.join(path, name)

For example, to find all *.txt files in current dir:

txtfiles = find_files('*.txt', '.')

Then get file objects from the names:

def open_files(filenames):
    for name in filenames:
        yield open(name, 'r', encoding='utf-8')

Finally individual lines from files:

def lines_from_files(files):
    for f in files:
        for line in f:
            yield line

Since you want to find some errors the check could look like this:

import re

def find_errors(lines):
    pattern = re.compile('(error|nmi|CATERR)')
    for line in lines:
        if pattern.search(line):
            print(line)  

You can now process a stream of lines generated from a given directory:

txt_file_names = find_files('*.txt', '.')
txt_files = open_files(txt_file_names)
txt_lines = lines_from_files(txt_files)
find_errors(txt_lines)

The idea to process logs as a stream of data originates from talk by David Beazley.

Upvotes: 1

Related Questions