Pandas event dataframe (start stop) to regular time series

Question

I am dealing with a dataset of events. Every row has a start time, an end time and a value.

It looks like this one:

df = pd.DataFrame({'start': ['2015-01-05 12:21:00', '2015-01-05 18:01:23', '2015-01-05 23:11:01'], 'end': ['2015-01-05 13:18:45', '2015-01-05 21:03:51', '2015-01-05 12:08:11'], 'value': [3, 4, 5]})

    end start   value
0   2015-01-07 11:18:45 2015-01-07 11:35:00 3
1   2015-01-07 12:08:11 2015-01-07 23:11:01 5
2   2015-01-07 17:03:51 2015-01-07 18:01:23 4

I need to resample it in a 10 minutes regular time series like the following:

    time values
1   2015-01-07 11:01:00  NULL
2   2015-01-07 11:11:00  3
3   2015-01-07 11:21:00  3
4   2015-01-07 11:31:00  3
5   2015-01-07 11:41:00  NULL
...
..  2015-01-07 12:11:00  5
...
..  2015-01-07 17:01:00  5,4
...

What is the most efficient way to deal with this resampling? I also highlighted the possibility of multiple events with the last value "5,4", what could be the best way to deal with it?

Thanks!

ALollz · Accepted Answer

Well if you eventually need to merge with other DataFrames it will likely make your life easier to standardize the time period always assuming an interval goes from :00-:10 for instance.

In this case one possibility is to transform the time span dataframe into one where each period is listed out explicitly, and then you can groupby the time period and then apply(list) to get the output you want.

df['temp_id'] = range(len(df))
df_time = (df[['temp_id', 'start', 'end']].set_index('temp_id').stack()
          .reset_index(level=-1, drop=True).rename('time').to_frame())

df_time = (df_time.groupby('temp_id').apply(lambda x: x.set_index('time')
            .resample('10T').asfreq()).reset_index())
df_time = df_time.merge(df[['value', 'temp_id']]).groupby('time')['value'].apply(list).to_frame()

If you then want to include the indices that have nothing you can just reindex this dataframe.

df_time.reindex(pd.date_range(start=df_time.index.min(),
     end=df_time.index.max(), freq='10T'))

#                    value
#2015-01-07 11:10:00   [3]
#2015-01-07 11:20:00   [3]
#2015-01-07 11:30:00   [3]
#2015-01-07 11:40:00   NaN
#2015-01-07 11:50:00   NaN
#...
#2015-01-07 17:40:00  [4, 5]
#2015-01-07 17:50:00  [4, 5]
#2015-01-07 18:00:00  [4, 5]
#2015-01-07 18:10:00     [4]
#...

Pandas event dataframe (start stop) to regular time series

Answers (1)

Related Questions