Stavros Anastasiadis
Stavros Anastasiadis

Reputation: 413

pandas timeseries identification values based on date index

I have a pandas 30min interval timeseries. A small sample looks like:

2009-12-02 20:00:00    0.6
2009-12-02 20:30:00    0.7
2009-12-03 01:00:00    0.7
2009-12-03 02:30:00    0.7
2009-12-03 11:30:00    0.7
2009-12-03 12:00:00    1.4
2009-12-03 12:30:00    1.3

I must indentify starting , finished date (and store it) if the interval between 2 continues timestamp is more than 2 hours. for example :

event 1 : 2009-12-02 20:00:00 -  2009-12-02 20:30:00 
event 2 : 2009-12-03 01:00:00 -  2009-12-03 02:30:00  
event 3 : 2009-12-03 11:30:00 -  2009-12-03 12:30:00  

but i am a bit stuck here!!! Usually, if it was a DataFrame i would use something like : for index, row df.iterrows(): #if timedelta > 2 hours etc

Any suggestion how i could start? Ty

Upvotes: 2

Views: 188

Answers (2)

HYRY
HYRY

Reputation: 97331

  1. Calculate the time difference between two rows.
  2. Get a bool series by compare the difference with 2 hours.
  3. cumsum() the bool series to get a series that can split the dateframe.
  4. call groupby and apply to get the begin and end datetime for every group.

Here is the code:

import pandas as pd
import io
import numpy as np

data = r"""date,value
2009-12-02 20:00:00,0.6
2009-12-02 20:30:00,0.7
2009-12-03 01:00:00,0.7
2009-12-03 02:30:00,0.7
2009-12-03 11:30:00,0.7
2009-12-03 12:00:00,1.4
2009-12-03 12:30:00,1.3"""

df = pd.read_csv(io.StringIO(data), parse_dates=[0])
diff = df.date - df.date.shift(1)

sections = (diff > np.timedelta64(2, "h")).astype(int).cumsum()

def f(s):
    return s.iloc[[0, -1]].reset_index(drop=True)

print df.date.groupby(sections).apply(f).unstack()

output:

                   0                   1
0 2009-12-02 20:00:00 2009-12-02 20:30:00
1 2009-12-03 01:00:00 2009-12-03 02:30:00
2 2009-12-03 11:30:00 2009-12-03 12:30:00

Upvotes: 3

PasteBT
PasteBT

Reputation: 2198

t.txt has the log data

from datetime import datetime

d1 = d2 = d3 = None
for line in open('t.txt'):
    d3 = datetime.strptime(line[:19], '%Y-%m-%d %H:%M:%S')
    if d1 is None:
        d1 = d2 = d3
    elif (d3 - d2).seconds >= 2 * 3600:
        print d1, d2
        d1 = d2 = d3
    else:
        d2 = d3
print d1, d2

Upvotes: 0

Related Questions