Reputation: 413
I have a pandas 30min interval timeseries. A small sample looks like:
2009-12-02 20:00:00 0.6
2009-12-02 20:30:00 0.7
2009-12-03 01:00:00 0.7
2009-12-03 02:30:00 0.7
2009-12-03 11:30:00 0.7
2009-12-03 12:00:00 1.4
2009-12-03 12:30:00 1.3
I must indentify starting , finished date (and store it) if the interval between 2 continues timestamp is more than 2 hours. for example :
event 1 : 2009-12-02 20:00:00 - 2009-12-02 20:30:00
event 2 : 2009-12-03 01:00:00 - 2009-12-03 02:30:00
event 3 : 2009-12-03 11:30:00 - 2009-12-03 12:30:00
but i am a bit stuck here!!! Usually, if it was a DataFrame i would use something like :
for index, row df.iterrows():
#if timedelta > 2 hours etc
Any suggestion how i could start? Ty
Upvotes: 2
Views: 188
Reputation: 97331
Here is the code:
import pandas as pd
import io
import numpy as np
data = r"""date,value
2009-12-02 20:00:00,0.6
2009-12-02 20:30:00,0.7
2009-12-03 01:00:00,0.7
2009-12-03 02:30:00,0.7
2009-12-03 11:30:00,0.7
2009-12-03 12:00:00,1.4
2009-12-03 12:30:00,1.3"""
df = pd.read_csv(io.StringIO(data), parse_dates=[0])
diff = df.date - df.date.shift(1)
sections = (diff > np.timedelta64(2, "h")).astype(int).cumsum()
def f(s):
return s.iloc[[0, -1]].reset_index(drop=True)
print df.date.groupby(sections).apply(f).unstack()
output:
0 1
0 2009-12-02 20:00:00 2009-12-02 20:30:00
1 2009-12-03 01:00:00 2009-12-03 02:30:00
2 2009-12-03 11:30:00 2009-12-03 12:30:00
Upvotes: 3
Reputation: 2198
t.txt has the log data
from datetime import datetime
d1 = d2 = d3 = None
for line in open('t.txt'):
d3 = datetime.strptime(line[:19], '%Y-%m-%d %H:%M:%S')
if d1 is None:
d1 = d2 = d3
elif (d3 - d2).seconds >= 2 * 3600:
print d1, d2
d1 = d2 = d3
else:
d2 = d3
print d1, d2
Upvotes: 0