roberto tomás
roberto tomás

Reputation: 4687

pandas - create repeated data from change-indications in time series

I have a time series that indicates location changes, like this:

08-09-2018 17:00:00, user_1, home
08-09-2018 18:30:00, user_2, home
08-09-2018 18:40:00, user_1, recreation center

I need create "buckets" (in this example, maybe every 15m), and I need to fill every bucket with what was in the last bucket, like this:

08-09-2018 17:00:00, user_1, home
08-09-2018 17:15:00, user_1, home
08-09-2018 17:30:00, user_1, home
08-09-2018 17:45:00, user_1, home
08-09-2018 18:00:00, user_1, home
08-09-2018 18:15:00, user_1, home
08-09-2018 18:30:00, user_1, home
08-09-2018 18:30:00, user_2, home
08-09-2018 18:45:00, user_1, recreation center
08-09-2018 18:45:00, user_2, home
08-09-2018 19:00:00, user_1, recreation center
08-09-2018 19:00:00, user_2, home

from there I will get dummy data for the location names .. but that part I know how to do :) If it helps, feel free to group it like this:

 pd.crosstab([locationDf.date, locationDf.user], locationDf.location)

how can I do the first part?

I can do it like this:

for user, user_loc_dc in locDf.groupby('user'): user_loc_dc.resample('15T').agg('max').ffill() # just append these

Upvotes: 0

Views: 39

Answers (1)

rahlf23
rahlf23

Reputation: 9019

Use pd.resample() and ffill():

dates = [pd.Timestamp('08-09-2018 17:00:00'), pd.Timestamp('08-09-2018 18:30:00'), pd.Timestamp('08-09-2018 18:40:00'), pd.Timestamp('08-09-2018 19:00:00')]

data = [['user_1', 'home'], ['user_2', 'home'], ['user_1', 'recreation center'], ['user_2', 'home']]

resampled = pd.Series(data, dates).resample('15T').ffill()

Yields:

2018-08-09 17:00:00                 [user_1, home]
2018-08-09 17:15:00                 [user_1, home]
2018-08-09 17:30:00                 [user_1, home]
2018-08-09 17:45:00                 [user_1, home]
2018-08-09 18:00:00                 [user_1, home]
2018-08-09 18:15:00                 [user_1, home]
2018-08-09 18:30:00                 [user_2, home]
2018-08-09 18:45:00    [user_1, recreation center]
2018-08-09 19:00:00                 [user_2, home]
Freq: 15T, dtype: object

Upvotes: 1

Related Questions