Reputation: 4687
I have a time series that indicates location changes, like this:
08-09-2018 17:00:00, user_1, home
08-09-2018 18:30:00, user_2, home
08-09-2018 18:40:00, user_1, recreation center
I need create "buckets" (in this example, maybe every 15m), and I need to fill every bucket with what was in the last bucket, like this:
08-09-2018 17:00:00, user_1, home
08-09-2018 17:15:00, user_1, home
08-09-2018 17:30:00, user_1, home
08-09-2018 17:45:00, user_1, home
08-09-2018 18:00:00, user_1, home
08-09-2018 18:15:00, user_1, home
08-09-2018 18:30:00, user_1, home
08-09-2018 18:30:00, user_2, home
08-09-2018 18:45:00, user_1, recreation center
08-09-2018 18:45:00, user_2, home
08-09-2018 19:00:00, user_1, recreation center
08-09-2018 19:00:00, user_2, home
from there I will get dummy data for the location names .. but that part I know how to do :) If it helps, feel free to group it like this:
pd.crosstab([locationDf.date, locationDf.user], locationDf.location)
how can I do the first part?
I can do it like this:
for user, user_loc_dc in locDf.groupby('user'): user_loc_dc.resample('15T').agg('max').ffill() # just append these
Upvotes: 0
Views: 39
Reputation: 9019
Use pd.resample()
and ffill()
:
dates = [pd.Timestamp('08-09-2018 17:00:00'), pd.Timestamp('08-09-2018 18:30:00'), pd.Timestamp('08-09-2018 18:40:00'), pd.Timestamp('08-09-2018 19:00:00')]
data = [['user_1', 'home'], ['user_2', 'home'], ['user_1', 'recreation center'], ['user_2', 'home']]
resampled = pd.Series(data, dates).resample('15T').ffill()
Yields:
2018-08-09 17:00:00 [user_1, home]
2018-08-09 17:15:00 [user_1, home]
2018-08-09 17:30:00 [user_1, home]
2018-08-09 17:45:00 [user_1, home]
2018-08-09 18:00:00 [user_1, home]
2018-08-09 18:15:00 [user_1, home]
2018-08-09 18:30:00 [user_2, home]
2018-08-09 18:45:00 [user_1, recreation center]
2018-08-09 19:00:00 [user_2, home]
Freq: 15T, dtype: object
Upvotes: 1