Reputation: 6048
i'm new to python. I am attempting to write a quick and dirty python script to find certain strings log files and extract certain info from that line. The lines in the log file look like this
2012-08-01 13:36:40,449 [PDispatcher: ] ERROR Fatal error DEF_CON encountered. Shutting down
2012-08-01 14:17:10,749 [PDispatcher: ] INFO Package 1900034442 Queued for clearance.
2012-08-01 14:23:06,998 [PDispatcher: ] ERROR Exception occurred attempting to lookup prod id 90000142
I have a function where the input parameters will be a filename and an array of patterns to look for. Currently i can find all lines within the file that contains one or more of the the specified patterns (though not sure if its the most efficient way) and i'm able to extract the line number and line.
def searchLogs(fn, searchPatterns):
res = []
with open(fn) as f:
for lineNo, line in enumerate(f, 1):
#check if pattern strings exist in line
for sPattern in searchPatterns:
if sPattern in line:
fountItem = [fn, pattern, lineNo, line]
res.append(fountItem)
return res
searchLogs("c:\temp\app.log", ["ERROR", "DEF_CON"]) #this should return 3 elements based on the above log snipped (2 for the first line and 1 for the third line)
What i would like to do also is to extract the date and time while searching. I was therefore thinking of modifying the search patterns to be a regular expression string with grouping that would search and extract the date. Only one problem, i'm not sure how to do this in python...any help would be appreciated.
Edit(Solution): With help from Sebastian and the link Joel provided, i've come up with this solution:
def search_logs(fn, searchPatterns):
res = []
with open(fn) as f:
for lineNo, line in enumerate(f, 1):
#check if pattern strings exist in line
for sPattern in searchPatterns:
#crude reg ex to match pattern and if matched, 'group' timestamp
rex = r'^(.+) \[.*' + pattern
ms = re.match(rex, line)
if ms:
time = ms.group(1)
item = Structs.MatchedItem(fn, pattern, lineNo, line, time)
res.append(item)
return res
search_logs("c:\temp\app.log", ["ERROR", "DEF_CON"]) #this should return 3 elements based on the above log snipped (2 for the first line and 1 for the third line)
Upvotes: 1
Views: 896
Reputation: 3686
Here is your regular expression. I have tested the regular expression but not the full code
def searchLogs(fn, searchPatterns):
res = []
with open(fn) as f:
for lineNo, line in enumerate(f, 1):
#check if pattern strings exist in line
for sPattern in searchPatterns:
if sPattern in line:
date = re.search(r'(19|20)\d{2}-(0[1-9]|[12])-(0[1-9]|[12][0-9]|3[01])',line).group()
time = re.search(r'\b([01][0-9]|2[0-3]):([0-5][0-9]):([0-5][0-9]),[0-9][0-9][0-9]',line).group()
fountItem = (fn, pattern, lineNo, date, time, line) # prefer a tuple over list
res.append(fountItem)
return res
PS : REs are always a pain in the wrong place. Let me know if you need explanation. :)
Upvotes: 1
Reputation: 414405
There are two parts:
For the later you could use datetime.strptime()
function:
try:
dt = datetime.strptime(line.split(" [", 1)[0], "%Y-%m-%d %H:%M:%S,%f")
except ValueError:
dt = None
The former depends on how regular your log-files and how fast and robust you want the solution to be e.g., line.split(" [", 1)[0]
is fast, but fragile. A more robust solution is:
' '.join(line.split(None, 2)[:2])
but it might be slower.
Upvotes: 1