Reputation: 640
I have many log files with the format like:
2012-09-12 23:12:00 other logs here
and i need to extract the time string and compare the time delta between two log records. I did that with this:
for line in log:
l = line.strip().split()
timelist = [int(n) for n in re.split("[- :]", l[0]+' ' + l[1])]
#now the timelist looks like [2012,9,12,23,12,0]
Then when i got two records
d1 = datetime.datetime(timelist1[0], timelist1[1], timelist1[2], timelist1[3], timelist1[4], timelist1[5])
d2 = datetime.datetime(timelist2[0], timelist2[1], timelist2[2], timelist2[3], timelist2[4], timelist2[5])
delta = (d2-d1).seconds
The problem is it runs slowly,is there anyway to improve the performance?Thanks in advance.
Upvotes: 1
Views: 206
Reputation: 20339
You can also try without regexp, using the optional argument of split
(date, time, log) = line.split(" ", 2)
timerecord = datetime.datetime.strptime(date+" "+time, "%Y-%m-%d %H:%M:%S")
and then it'd be a matter of computing your timedeltas
between consecutive timerecord
s
Upvotes: 1
Reputation: 23575
You could do it entirely with regular expressions, which might be faster.
find_time = re.compile("^(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})")
for line in log:
timelist = find_time.match(line)
if timelist:
d = datetime.datetime(*map(int, timelist.groups()))
Upvotes: 1
Reputation: 298276
You could get rid of the regex and use map
:
date_time = datetime.datetime
for line in log:
date, time = line.strip().split(' ', 2)[:2]
timelist = map(int, date.split('-') + time.split(':'))
d = date_time(*timelist)
.split(' ', 2)
will be faster than just .split()
because it only splits up to two times and only on spaces, not on any whitespace.map(int, l)
is faster than [int(x) for x in l]
the last time I checked..strip()
.Upvotes: 1