Reputation: 210
I have a DataFrame containing information about stores. It looks like the following:
date | store_id | x
2019-01-01| 1 | 5
2019-01-01| 2 | 1
2019-01-05| 1 | 3
...
The multi-index is [date
,store_id
]. Note that the dates are not unique.
I want to resample the data at an hourly level, but only for the days in the date
column i.e. I don't want to fill in every hour in between. Furthermore, I want to fill in the value of x for every hour that is created. So the desired result for the above example would be
date | store_id | x
2019-01-01 00:00:00| 1 | 5
2019-01-01 01:00:00| 1 | 5
2019-01-01 02:00:00| 1 | 5
...
2019-01-01 23:00:00| 1 | 5
2019-01-01 00:00:00| 2 | 1
2019-01-01 01:00:00| 2 | 1
2019-01-01 02:00:00| 2 | 1
...
2019-01-01 23:00:00| 2 | 1
2019-01-05 00:00:00| 1 | 3
2019-01-05 01:00:00| 1 | 3
2019-01-05 02:00:00| 1 | 3
...
2019-01-05 23:00:00| 1 | 3
Upvotes: 0
Views: 55
Reputation: 30971
Define the following "replication" function:
def repl(row):
return pd.DataFrame({'date': pd.date_range(start=row.date,
periods=24, freq='H'),'store_id': row.store_id, 'x': row.x})
It "replicates" the source row (parameter), returning a sequence of rows with the given date, for consecutive hours.
Then:
The code to do it is:
pd.concat(df.reset_index().apply(repl, axis=1).tolist(), ignore_index=True)
Upvotes: 1