Reputation: 453
I am dealing with a dataset of events. Every row has a start time, an end time and a value.
It looks like this one:
df = pd.DataFrame({'start': ['2015-01-05 12:21:00', '2015-01-05 18:01:23', '2015-01-05 23:11:01'], 'end': ['2015-01-05 13:18:45', '2015-01-05 21:03:51', '2015-01-05 12:08:11'], 'value': [3, 4, 5]})
end start value
0 2015-01-07 11:18:45 2015-01-07 11:35:00 3
1 2015-01-07 12:08:11 2015-01-07 23:11:01 5
2 2015-01-07 17:03:51 2015-01-07 18:01:23 4
I need to resample it in a 10 minutes regular time series like the following:
time values
1 2015-01-07 11:01:00 NULL
2 2015-01-07 11:11:00 3
3 2015-01-07 11:21:00 3
4 2015-01-07 11:31:00 3
5 2015-01-07 11:41:00 NULL
...
.. 2015-01-07 12:11:00 5
...
.. 2015-01-07 17:01:00 5,4
...
What is the most efficient way to deal with this resampling? I also highlighted the possibility of multiple events with the last value "5,4", what could be the best way to deal with it?
Thanks!
Upvotes: 1
Views: 1778
Reputation: 59549
Well if you eventually need to merge with other DataFrames
it will likely make your life easier to standardize the time period always assuming an interval goes from :00-:10 for instance.
In this case one possibility is to transform the time span dataframe into one where each period is listed out explicitly, and then you can groupby
the time period and then apply(list)
to get the output you want.
df['temp_id'] = range(len(df))
df_time = (df[['temp_id', 'start', 'end']].set_index('temp_id').stack()
.reset_index(level=-1, drop=True).rename('time').to_frame())
df_time = (df_time.groupby('temp_id').apply(lambda x: x.set_index('time')
.resample('10T').asfreq()).reset_index())
df_time = df_time.merge(df[['value', 'temp_id']]).groupby('time')['value'].apply(list).to_frame()
If you then want to include the indices that have nothing you can just reindex this dataframe.
df_time.reindex(pd.date_range(start=df_time.index.min(),
end=df_time.index.max(), freq='10T'))
# value
#2015-01-07 11:10:00 [3]
#2015-01-07 11:20:00 [3]
#2015-01-07 11:30:00 [3]
#2015-01-07 11:40:00 NaN
#2015-01-07 11:50:00 NaN
#...
#2015-01-07 17:40:00 [4, 5]
#2015-01-07 17:50:00 [4, 5]
#2015-01-07 18:00:00 [4, 5]
#2015-01-07 18:10:00 [4]
#...
Upvotes: 1