Reputation: 29
I have a dataframe that has a datetime index and a time series of integer values 1 per day. From this, I want to identify occurrences where the timeseries is above a threshold for at least 2 consecutive days. For these events, I want to count how many of these occur over the entire span and the start date of each one.
One of the issues is making sure I don't over count the occurrences when the event lasts more than 2 days, as long as the values stay over the threshold it should only be 1 event whether it lasts 2 days of 10 days.
I can do this using a function with lots of if statements but it's very kludgy. I want to learn a more pandas/pythonic way of doing it.
I started by looking at a masked version of the data of only the values that were above the threshold (arbitrary here) and using diff() which seemed promising but I'm still stuck. Any help is appreciated.
dates = pd.date_range('2012-01-01', periods=100, freq='D')
values = np.random.randint(100, size=len(dates))
df = pd.DataFrame({'timeseries':values}, index=dates)
df.loc[df['timeseries'] > arb_thr].index.to_series().diff().head(20)
Upvotes: 2
Views: 1415
Reputation: 42886
You can make use of booleans to flag the rows which are above the threshold, then cumsum
these to create the artificial groups and finally groupby on them:
arb_thr = 20
df = df.reset_index()
grps = df["timeseries"].lt(arb_thr).cumsum()
result = df.groupby(grps).agg(
min_date=("index", "min"),
max_date=("index", "max"),
count=("timeseries", "count")
).rename_axis(None, axis=0)
min_date max_date count
0 2012-01-01 2012-01-09 9
1 2012-01-10 2012-01-11 2
2 2012-01-12 2012-01-12 1
3 2012-01-13 2012-01-22 10
4 2012-01-23 2012-01-24 2
5 2012-01-25 2012-02-04 11
6 2012-02-05 2012-02-07 3
7 2012-02-08 2012-02-08 1
8 2012-02-09 2012-02-10 2
9 2012-02-11 2012-02-12 2
10 2012-02-13 2012-02-15 3
11 2012-02-16 2012-02-20 5
12 2012-02-21 2012-02-21 1
13 2012-02-22 2012-02-23 2
14 2012-02-24 2012-02-25 2
15 2012-02-26 2012-03-04 8
16 2012-03-05 2012-03-07 3
17 2012-03-08 2012-03-20 13
18 2012-03-21 2012-03-22 2
19 2012-03-23 2012-03-23 1
20 2012-03-24 2012-03-24 1
21 2012-03-25 2012-03-28 4
22 2012-03-29 2012-03-29 1
23 2012-03-30 2012-04-01 3
24 2012-04-02 2012-04-08 7
25 2012-04-09 2012-04-09 1
Upvotes: 2