Reputation: 69
Please can someone help me solve the following query? I have a log file with thousands of lines like the following:-
jarid: 7e5ae720-9151-11e0-eff2-00238bce4216 recv: 1 timestamp: 00:00:02,217
jarid: 7e5ae720-9151-11e0-eff2-00238bce4216 ack: 13 timestamp: 00:00:04,537
jarid: 462c6d11-9151-11e0-a72c-00238bbdc9e7 recv: 1 timestamp: 00:00:08,018
jarid: 462c6d11-9151-11e0-a72c-00238bbdc9e7 nack: 14 timestamp: 00:00:10,338
I would like to write a python script to iterate through this file and based on the jarid (the second field in the log file) to get the timestamp from each line where the jarid is found and print them on the same line. So for example, for the following two lines:-
jarid: 7e5ae720-9151-11e0-eff2-00238bce4216 recv: 1 timestamp: 00:00:02,217
jarid: 7e5ae720-9151-11e0-eff2-00238bce4216 ack: 13 timestamp: 00:00:04,537
I would get the following output:-
jarid: 7e5ae720-9151-11e0-eff2-00238bce4216 recv: 00:00:02,217 ack: 00:00:04,537
I think the best way to accomplish this is with a dictionary (or maybe not!, please comment). I have written the following script, which is somewhat working, but it is not giving me the desired output:-
#!/opt/SP/bin/python
log = file(/opt/SP/logs/generic.log, "r")
filecontent = log.xreadlines()
storage = {}
for line in filecontent:
line = line.strip()
jarid, JARID, status, STATUS, timestamp, TIME = line.split(" ")
if JARID not in storage:
storage[JARID] = {}
if STATUS not in storage[JARID]:
storage[JARID][STATUS] = {}
if TIME not in storage[JARID][STATUS]:
storage[JARID][STATUS][TIME] = {}
jarids = storage.keys()
jarids.sort()
for JARID in jarids:
stats = storage[JARID].keys()
stats.sort()
for STATUS in stats:
times = storage[JARID][STATUS].keys()
times.sort()
for TIME in times:
all = storage[JARID][STATUS][TIME].keys()
all.sort()
for JARID in jarids:
if "1" in storage[JARID].keys() and "13" in storage[JARID].keys():
print "MSG: %s, RECV: %s, ACK: %s" % (JARID, storage[JARID]["1"], storage[JARID]["13"])
else:
if "1" in storage[JARID].keys() and "14" in storage[JARID].keys():
print "MSG: %s, RECV: %s, NACK: %s" % (JARID, storage[JARID]["1"], storage[JARID]["14"])
When I run this script, I am getting the following output:-
MSG: 7e5ae720-9151-11e0-eff2-00238bce4216, RECV: {'00:00:02,217': {}}, ACK: {'00:00:04,537': {}}
Please note that I am still learning python and that my scripting skills are not all that!
Please, can you help me figure out how to get the desired output as I wrote above?
Upvotes: 2
Views: 2964
Reputation: 33397
That should work. Updated.
using:
log = ['jarid: 7e5ae720-9151-11e0-eff2-00238bce4216 recv: 1 timestamp: 00:00:02,217',
'jarid: 7e5ae720-9151-11e0-eff2-00238bce4216 ack: 13 timestamp: 00:00:04,537',
'jarid: 462c6d11-9151-11e0-a72c-00238bbdc9e7 recv: 1 timestamp: 00:00:08,018',
'jarid: 462c6d11-9151-11e0-a72c-00238bbdc9e7 nack: 14 timestamp: 00:00:10,338']
you can do:
d = {}
for i in (line.split() for line in log):
d.setdefault(i[1], {}).update({i[2]:i[-1]})
#as pointed by @gnibbler, you can also use "defaultdict"
#instead of dict with "setdefault"
then you may print it with:
for i,j in d.items():
print 'jarid:', i,
for k,m in j.items():
print k, m,
print
Upvotes: 0
Reputation: 22619
This solution is somewhat similar to @JBernardo, though I choose to parse the lines with a regular expression. I've written it now so I may as well publish it; Might be of some use.
import re
line_pattern = re.compile(
r"jarid: (?P<jarid>[a-z0-9\-]+) (?P<action>[a-z]+): (?P<status>[0-9]+) timestamp: (?P<ts>[0-9\:,]+)"
)
infile = open('/path/to/file.log')
entries = (line_pattern.match(line).groupdict() for line in infile)
events = {}
for entry in entries:
event = events.setdefault(entry['jarid'], {})
event[entry['action']] = entry['ts']
for jarid, event in events.iteritems():
ack_event = 'ack' if 'ack' in event else 'nack' if 'nack' in event else None
print 'jarid: %s recv: %s %s: %s' % (jarid, event.get('recv'), ack_event, event.get(ack_event))
Upvotes: 0
Reputation: 304205
Based on JBernardo's answer, but using defaultdict instead of setdefault. You can print it exactly the same way, so I won't copy that code here
from collections import defaultdict
log = ['jarid: 7e5ae720-9151-11e0-eff2-00238bce4216 recv: 1 timestamp: 00:00:02,217',
'jarid: 7e5ae720-9151-11e0-eff2-00238bce4216 ack: 13 timestamp: 00:00:04,537',
'jarid: 462c6d11-9151-11e0-a72c-00238bbdc9e7 recv: 1 timestamp: 00:00:08,018',
'jarid: 462c6d11-9151-11e0-a72c-00238bbdc9e7 nack: 14 timestamp: 00:00:10,338']
d = defaultdict(dict)
for i in (line.split() for line in log):
d[i[1]][i[2]] = i[-1]
You can also unpack into meaningful names. for example
for label1, jarid, jartype, x, label2, timestamp in (line.split() for line in log):
d[jarid][jartype] = timestamp
Upvotes: 2
Reputation: 208485
Here is a regex solution:
import re
pattern = re.compile(r"""jarid:\s(\S+) # save jarid to group 1
\s(recv:)\s\d+ # save 'recv:' to group 2
\stimestamp:\s(\S+) # save recv timestamp to group 3
.*?jarid:\s\1 # make sure next line has same jarid
\s(n?ack:)\s\d+ # save 'ack:' or 'nack:' to group 4
\stimestamp:\s(\S+) # save ack timestamp to group 5
""", re.VERBOSE | re.DOTALL | re.MULTILINE)
for content in pattern.finditer(log):
print " jarid: " + " ".join(content.groups())
Upvotes: 0
Reputation: 6699
I wouldn't make status
a dictionary. Instead I would just store the timestamp
for each status
key in your jarid
dictionary. Better explained with an example...
def search_jarids(jarid):
stored_jarid = storage[jarid]
entry = "jarid: %s" % jarid
for status in stored_jarid:
entry += " %s: %s" % (status, stored_jarid[status])
return entry
with open("yourlog.log", 'r') as log:
lines = log.readlines()
storage = {}
for line in lines:
line = line.strip()
jarid_tag, jarid, status_tag, status, timestamp_tag, timestamp = line.split(" ")
if jarid not in storage:
storage[jarid] = {}
status_tag = status_tag[:-1]
storage[jarid][status_tag] = timestamp
print search_jarids("462c6d11-9151-11e0-a72c-00238bbdc9e7")
Would give you:
jarid: 462c6d11-9151-11e0-a72c-00238bbdc9e7 nack: 00:00:10,338 recv: 00:00:08,018
Hope it gets you started.
Upvotes: 0