mike01010
mike01010

Reputation: 6048

python to search files and parse using regular expression

i'm new to python. I am attempting to write a quick and dirty python script to find certain strings log files and extract certain info from that line. The lines in the log file look like this

2012-08-01 13:36:40,449 [PDispatcher: ] ERROR  Fatal error DEF_CON encountered. Shutting down
2012-08-01 14:17:10,749 [PDispatcher: ] INFO  Package 1900034442 Queued for clearance.
2012-08-01 14:23:06,998 [PDispatcher: ] ERROR Exception occurred attempting to lookup prod id 90000142

I have a function where the input parameters will be a filename and an array of patterns to look for. Currently i can find all lines within the file that contains one or more of the the specified patterns (though not sure if its the most efficient way) and i'm able to extract the line number and line.

def searchLogs(fn, searchPatterns):
    res = []
    with open(fn) as f:
        for lineNo, line in enumerate(f, 1):
            #check if pattern strings exist in line
            for sPattern in searchPatterns:
                if sPattern in line:
                    fountItem = [fn, pattern, lineNo, line]
                    res.append(fountItem)
    return res

searchLogs("c:\temp\app.log", ["ERROR", "DEF_CON"]) #this should return 3 elements based on the above log snipped (2 for the first line and 1 for the third line)

What i would like to do also is to extract the date and time while searching. I was therefore thinking of modifying the search patterns to be a regular expression string with grouping that would search and extract the date. Only one problem, i'm not sure how to do this in python...any help would be appreciated.

Edit(Solution): With help from Sebastian and the link Joel provided, i've come up with this solution:

def search_logs(fn, searchPatterns):
    res = []
    with open(fn) as f:
        for lineNo, line in enumerate(f, 1):
            #check if pattern strings exist in line
            for sPattern in searchPatterns:
                #crude reg ex to match pattern and if matched, 'group' timestamp
                rex = r'^(.+) \[.*' + pattern 
                ms = re.match(rex, line)
                if ms:
                    time = ms.group(1)
                    item = Structs.MatchedItem(fn, pattern, lineNo, line, time)
                    res.append(item)
    return res

search_logs("c:\temp\app.log", ["ERROR", "DEF_CON"]) #this should return 3 elements based on the above log snipped (2 for the first line and 1 for the third line)

Upvotes: 1

Views: 896

Answers (2)

Antony Thomas
Antony Thomas

Reputation: 3686

Here is your regular expression. I have tested the regular expression but not the full code

def searchLogs(fn, searchPatterns):
    res = []
    with open(fn) as f:
        for lineNo, line in enumerate(f, 1):
            #check if pattern strings exist in line
            for sPattern in searchPatterns:
                if sPattern in line:
                    date = re.search(r'(19|20)\d{2}-(0[1-9]|[12])-(0[1-9]|[12][0-9]|3[01])',line).group()
                    time = re.search(r'\b([01][0-9]|2[0-3]):([0-5][0-9]):([0-5][0-9]),[0-9][0-9][0-9]',line).group()
                    fountItem = (fn, pattern, lineNo, date, time, line) # prefer a tuple over list
                    res.append(fountItem)
    return res

PS : REs are always a pain in the wrong place. Let me know if you need explanation. :)

Upvotes: 1

jfs
jfs

Reputation: 414405

There are two parts:

  • extract datetime string
  • parse it into a datetime object

For the later you could use datetime.strptime() function:

try:
    dt = datetime.strptime(line.split(" [", 1)[0], "%Y-%m-%d %H:%M:%S,%f")
except ValueError:
    dt = None

The former depends on how regular your log-files and how fast and robust you want the solution to be e.g., line.split(" [", 1)[0] is fast, but fragile. A more robust solution is:

' '.join(line.split(None, 2)[:2])

but it might be slower.

Upvotes: 1

Related Questions