Judking
Judking

Reputation: 6381

How to apply time-window related data analysis via Pandas in Python

Let's assume there're two dataframes, recording two different operations (who at which time did the operation) respectively.

# operation A
df1 = pd.DataFrame([['a', 138], ['a', 140], ['b', 150]], columns=['name', 'timestamp'])
print df1

# operation B
df2 = pd.DataFrame([['a', 145], ['b', 170]], columns=['name', 'timestamp'])
print df2

--output--
  name  timestamp
0    a        138
1    a        140
2    b        150
  name  timestamp
0    a        145
1    b        170

What I intend to do is to retrieve consecutive operations A->B where B happens within 10 units against previous A happening time.

In the example above, (B, a, 145) and (A, a, 140) will be a qualified consecutive operations, yet (B, b, 170) and (A, b, 150) will not be since 170-150=20>10 units.

Could anyone give me the most efficient way to processing this model is the data is relatively large (10,000-100,000)? Many thanks.

Upvotes: 1

Views: 61

Answers (1)

dmb
dmb

Reputation: 1719

First you would create a new column that is the difference between consecutive timestamps

df1.loc[:, 'time_diff'] = df1.timestamp.diff()
df1
  name  timestamp  time_diff
0    a        138        NaN
1    a        140          2
2    b        150         10

df2.loc[:, 'time_diff'] = df2.timestamp.diff()

then you want to see when name transitions. you can do this using duplicated. You also want to ignore the first row (so when time_diff is null). This will tell you the rows where a transition happened, and you can examine the time_diff column however you want

df1.loc[~df1.name.duplicated() & df1.time_diff.notnull(),:]
df1
  name  timestamp  time_diff
2    b        150         10

vs.

df2.loc[~df2.name.duplicated() & df2.time_diff.notnull(),:]
df2
  name  timestamp  time_diff
1    b        170         25

so if you have tons of rows, you can have some logic like

df.loc[~df.name.duplicated() & df.time_diff.notnull() & (df.time_diff <= 10)]

Upvotes: 1

Related Questions