Reputation: 431
hi,I have a log file,the file content is below:
[ 06-15 14:07:48.377 15012:15012 D/ViewRootImpl ]
ViewPostImeInputStage processKey 0
[ 06-15 14:07:48.397 3539: 4649 D/AudioService ]
active stream is 0x8
[ 06-15 14:07:48.407 4277: 4293 D/vol.VolumeDialogControl.VC ]
isSafeVolumeDialogShowing : false
I want to extract some information from the log file.The expected format is below:
[('06-15 14:07:48.377', '15012', 'D', 'ViewRootImpl', 'ViewPostImeInputStage processKey 0'),
('06-15 14:07:48.397', '3539', '4649', 'D', 'AudioService', 'active stream is 0x8'),
('06-15 14:07:48.407', '4277', '4293', 'D', 'vol.VolumeDialogControl.VC', 'isSafeVolumeDialogShowing : false')]
Question: What's the best python regex for extracting the expected format infomation? Thanks very much!
upate: I have tried the below code
import re
regex = r"(\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{3})\s(\d+).*(\w{1})/(.*)\](.*)"
data = [g.groups() for g in re.finditer(regex, log, re.M | re.I)]
The result I have got is
data=[('06-15 14:07:48.377', '15012', 'D', 'ViewRootImpl', '\r'), (
'06-15 14:07:48.397', '3539', 'D', 'AudioService', '\r'), ('06-15 14:07:48.407',
'4277', 'D', 'vol.VolumeDialogControl.VC', '\r')]
I can't get the last element.
Upvotes: 1
Views: 84
Reputation: 3581
#!/usr/bin/python2
# -*- coding: utf-8 -*-
import re
input = """
[ 06-15 14:07:48.377 15012:15012 D/ViewRootImpl ]
ViewPostImeInputStage processKey 0
[ 06-15 14:07:48.397 3539: 4649 D/AudioService ]
active stream is 0x8
[ 06-15 14:07:48.407 4277: 4293 D/vol.VolumeDialogControl.VC ]
isSafeVolumeDialogShowing : false
"""
# remove carriage return
input = re.sub('(\])\s+', '\\1 ', input)
# replace D/Something ] -> D Something
input = re.sub('([A-Z]{1})/([^\s]+)\s+\]\s+', '\\1 \\2 ', input)
# remove first [
input = re.sub('\[\s+([0-9]{2}\-[0-9]{2})', '\\1', input)
print input
output
06-15 14:07:48.377 15012:15012 D ViewRootImpl ViewPostImeInputStage processKey 0
06-15 14:07:48.397 3539: 4649 D AudioService active stream is 0x8
06-15 14:07:48.407 4277: 4293 D vol.VolumeDialogControl.VC isSafeVolumeDialogShowing : false
Upvotes: 1
Reputation: 92854
Use the following approach:
with open('yourlogfile', 'r') as log:
lines = log.read()
result = re.sub(r'^\[ (\S+) *(\S+) *(\d+): *(\d+) *([A-Z]+)\/(\S+) \]\n([^\n]+)\n?',
r'\1 \2 \3 \4 \5 \6 \7', lines, flags=re.MULTILINE)
print(result)
The output:
06-15 14:07:48.377 15012 15012 D ViewRootImpl ViewPostImeInputStage processKey 0
06-15 14:07:48.397 3539 4649 D AudioService active stream is 0x8
06-15 14:07:48.407 4277 4293 D vol.VolumeDialogControl.VC isSafeVolumeDialogShowing : false
To get the result as a list of matches use re.findall()
function:
...
result = re.findall(r'^\[ (\S+) *(\S+) *(\d+): *(\d+) *([A-Z]+)\/(\S+) \]\n([^\n]+)\n?', lines, flags=re.MULTILINE)
print(result)
The output:
[('06-15', '14:07:48.377', '15012', '15012', 'D', 'ViewRootImpl', 'ViewPostImeInputStage processKey 0'), ('06-15', '14:07:48.397', '3539', '4649', 'D', 'AudioService', 'active stream is 0x8'), ('06-15', '14:07:48.407', '4277', '4293', 'D', 'vol.VolumeDialogControl.VC', 'isSafeVolumeDialogShowing : false')]
Upvotes: 2