BAC83
BAC83

Reputation: 891

Count on a rolling time window in pandas

I'm trying to return a count on a time window about a (moving) fixed point.

It's an attempt to understand the condition of an instrument at any time, as a function of usage prior to it.

So if the instrument is used at 12.05pm, 12.10, 12.15, 12.30, 12.40 and 1pm, the usage counts would be:

12.05 -> 1 (once in the last hour)

12.10 -> 2

12.15 -> 3

12.30 -> 4

12.40 -> 5

1.00 -> 6

... but then lets say usage resumes at 1.06: 1.06 -> 6 this doesn't increase the count, as the first run is over an hour ago.

How can I calculate this count and append it as a column?

It feels like this is an groupby/aggregate/count using possibly timedeltas in a lambda function, but I don't know where to start past that.

I'd like to be able to play with the time window too, so not just the past hour, but the hour surrounding an instance i.e. + and -30 minutes.

The following code gives a starting dataframe:

s = pd.Series(pd.date_range('2020-1-1', periods=8000, freq='250s'))
df = pd.DataFrame({'Run time': s})
df_sample = df.sample(6000)
df_sample = df_sample.sort_index()

The best help i found (and to be fair i can usually hack together from the logic) was this Distinct count on a rolling time window but i've not managed this time.

Thanks

Upvotes: 4

Views: 6847

Answers (1)

Fred
Fred

Reputation: 502

I've done something similar previously with the DataFrame.rolling function: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html

So for your dataset, first you need to update the index to the datetime field, then you can preform the analysis you need, so continuing on from your code:

s = pd.Series(pd.date_range('2020-1-1', periods=8000, freq='250s'))
df = pd.DataFrame({'Run time': s})
df_sample = df.sample(6000)
df_sample = df_sample.sort_index()

# Create a value we can count
df_sample('Occurrences') = 1

# Set the index to the datetime element
df_sample = df_sample.set_index('Run time')

# Use Pandas rolling method, 3600s = 1 Hour
df_sample['Occurrences in Last Hour'] = df_sample['Occurrences'].rolling('3600s').sum()

df_sample.head(15)

                     Occurrences  Occurrences in Last Hour
Run time                                                   
2020-01-01 00:00:00            1                       1.0
2020-01-01 00:04:10            1                       2.0
2020-01-01 00:08:20            1                       3.0
2020-01-01 00:12:30            1                       4.0
2020-01-01 00:16:40            1                       5.0
2020-01-01 00:25:00            1                       6.0
2020-01-01 00:29:10            1                       7.0
2020-01-01 00:37:30            1                       8.0
2020-01-01 00:50:00            1                       9.0
2020-01-01 00:54:10            1                      10.0
2020-01-01 00:58:20            1                      11.0
2020-01-01 01:02:30            1                      11.0
2020-01-01 01:06:40            1                      11.0
2020-01-01 01:15:00            1                      10.0
2020-01-01 01:19:10            1                      10.0

You need to set the index to a datetime element to utilised the time base window, otherwise you can only use integer values corresponding to the number of rows.

Upvotes: 7

Related Questions