Bin in 10min samples

Question

I have a pandas dataframe that consists of the following columns

col1, col2, _time

_time column is datetime object of when the row occured in time.

I want to resample my dataframe in 10min periods groupby both columns and aggregate the number of rows for each group that occured for every 10min period of time. I want the produced dataframe to have the following columns

col1 col2 since until count

Where since is the begining of each 10min period until the finish of each 10min period of time and count the number of rows that where found on the initial dataframe something like

col1  col2          since                  until         count
1      1       08/12/2017 12:00      08/12/2017 12:10       10
1      2       08/12/2017 12:00      08/12/2017 12:10        5
1      1       08/12/2017 12:10      08/12/2017 12:20        3

is this possible with the resample method of dataframes?

thorbjornwolf · Accepted Answer

I too have previously been looking at resample for this, to no avail. Luckily, I found a solution using pd.Series.dt.floor!

Use .dt.floor to align your timestamps to 10-minute intervals,
Use the resulting object in a groupby (or, optionally, assign it to a column in your source data, and use the column)
Use pd.to_timedelta to calculate the until column from your since column

For instance,

import pandas as pd

interval = '10min'  # 10 minutes intervals, please

# Dummy data with 3-minute intervals
data = pd.DataFrame({
    'col1': [0, 0, 1, 0, 0, 0, 1, 0, 1, 1], 
    'col2': [4, 4, 4, 3, 4, 4, 3, 3, 4, 4], 
    '_time': pd.date_range(start='2010-01-01 00:01:00', freq='3min', periods=10),
})

# Floor the timestamps to your desired interval
since = data['_time'].dt.floor(interval).rename('since')

# Get the size of each group - groups are in the index of `agg`
agg = data.groupby(['col1', 'col2', since]).size()
agg = agg.rename('count')

# Back to dataframe
agg = agg.reset_index()

# Simply add your interval to `since`
agg['until'] = agg['since'] + pd.to_timedelta(interval)

print(agg)

   col1  col2               since  count               until
0     0     3 2010-01-01 00:10:00      1 2010-01-01 00:20:00
1     0     3 2010-01-01 00:20:00      1 2010-01-01 00:30:00
2     0     4 2010-01-01 00:00:00      2 2010-01-01 00:10:00
3     0     4 2010-01-01 00:10:00      2 2010-01-01 00:20:00
4     1     3 2010-01-01 00:10:00      1 2010-01-01 00:20:00
5     1     4 2010-01-01 00:00:00      1 2010-01-01 00:10:00
6     1     4 2010-01-01 00:20:00      2 2010-01-01 00:30:00

Bin in 10min samples

Answers (2)

Related Questions