pandas timeseries identification values based on date index

Question

I have a pandas 30min interval timeseries. A small sample looks like:

2009-12-02 20:00:00    0.6
2009-12-02 20:30:00    0.7
2009-12-03 01:00:00    0.7
2009-12-03 02:30:00    0.7
2009-12-03 11:30:00    0.7
2009-12-03 12:00:00    1.4
2009-12-03 12:30:00    1.3

I must indentify starting , finished date (and store it) if the interval between 2 continues timestamp is more than 2 hours. for example :

event 1 : 2009-12-02 20:00:00 -  2009-12-02 20:30:00 
event 2 : 2009-12-03 01:00:00 -  2009-12-03 02:30:00  
event 3 : 2009-12-03 11:30:00 -  2009-12-03 12:30:00

but i am a bit stuck here!!! Usually, if it was a DataFrame i would use something like : for index, row df.iterrows(): #if timedelta > 2 hours etc

Any suggestion how i could start? Ty

HYRY · Accepted Answer

Calculate the time difference between two rows.
Get a bool series by compare the difference with 2 hours.
cumsum() the bool series to get a series that can split the dateframe.
call groupby and apply to get the begin and end datetime for every group.

Here is the code:

import pandas as pd
import io
import numpy as np

data = r"""date,value
2009-12-02 20:00:00,0.6
2009-12-02 20:30:00,0.7
2009-12-03 01:00:00,0.7
2009-12-03 02:30:00,0.7
2009-12-03 11:30:00,0.7
2009-12-03 12:00:00,1.4
2009-12-03 12:30:00,1.3"""

df = pd.read_csv(io.StringIO(data), parse_dates=[0])
diff = df.date - df.date.shift(1)

sections = (diff > np.timedelta64(2, "h")).astype(int).cumsum()

def f(s):
    return s.iloc[[0, -1]].reset_index(drop=True)

print df.date.groupby(sections).apply(f).unstack()

output:

                   0                   1
0 2009-12-02 20:00:00 2009-12-02 20:30:00
1 2009-12-03 01:00:00 2009-12-03 02:30:00
2 2009-12-03 11:30:00 2009-12-03 12:30:00

pandas timeseries identification values based on date index

Answers (2)

Related Questions