PeterL
PeterL

Reputation: 515

Finding a range in the series with the most frequent occurrence of the entries over a defined time (in Pandas)

I have a large dataset in Pandas in which the entries are marked with a time stamp. I'm looking for a solution how to get a range of a defined length (like 1 minute) with the highest occurrence of entries.

One solution could be to resample the data to a higher timeframe (such as a minute) and comparing the sections with the highest number of values. However, It would only find ranges that correspond to the start and end time of the given timeframe.

I'd rather find a solution to find any 1-minute ranges no matter where they actually start.

In following example I would be looking for 1 minute “window” with highest occurrence of the entries starting with the first signal in the range and ending with last signal in the range:

8:50:00
8:50:01
8:50:03
8:55:00
8:59:10
9:00:01
9:00:02
9:00:03
9:00:04
9:05:00

Thus I would like to get range 8:59:10 - 9:00:04

Any hint how to accomplish this?

Upvotes: 1

Views: 166

Answers (1)

James
James

Reputation: 36691

You need to create 1 minute windows with a sliding start time of 1 second; compute the maximum occurrence for any of the windows. In pandas 0.19.0 or greater, you can resample a time series using base as an argument to start the resampled windows at different times.

I used tempfile to copy your data as a toy data set below.

import tempfile
import pandas as pd

tf = tempfile.TemporaryFile()
tf.write(b'''8:50:00
8:50:01
8:50:03
8:55:00
8:59:10
9:00:01
9:00:02
9:00:03
9:00:04
9:05:00''')
tf.seek(0)

df = pd.read_table(tf, header=None)
df.columns = ['time']
df.time = pd.to_datetime(df.time)

max_vals = []
for t in range(60):
    # .max().max() is not a mistake, use it to return just the value
    max_vals.append(
        (t, df.resample('60s', on='time', base=t).count().max().max())
    )

max(max_vals, key=lambda x: x[-1])
# returns:
(5, 5)

For this toy dataset, an offset of 5 seconds for the window (i.e. 8:49:05, 8:50:05, ...) has the first of the maximum count for a windows of 1 minute with 5 counts.

Upvotes: 1

Related Questions