pwerth
pwerth

Reputation: 210

Pandas resampling with multi-index

I have a DataFrame containing information about stores. It looks like the following: date | store_id | x 2019-01-01| 1 | 5 2019-01-01| 2 | 1 2019-01-05| 1 | 3 ...

The multi-index is [date,store_id]. Note that the dates are not unique. I want to resample the data at an hourly level, but only for the days in the date column i.e. I don't want to fill in every hour in between. Furthermore, I want to fill in the value of x for every hour that is created. So the desired result for the above example would be

date | store_id | x 2019-01-01 00:00:00| 1 | 5 2019-01-01 01:00:00| 1 | 5 2019-01-01 02:00:00| 1 | 5 ... 2019-01-01 23:00:00| 1 | 5 2019-01-01 00:00:00| 2 | 1 2019-01-01 01:00:00| 2 | 1 2019-01-01 02:00:00| 2 | 1 ... 2019-01-01 23:00:00| 2 | 1 2019-01-05 00:00:00| 1 | 3 2019-01-05 01:00:00| 1 | 3 2019-01-05 02:00:00| 1 | 3 ... 2019-01-05 23:00:00| 1 | 3

Upvotes: 0

Views: 55

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 30971

Define the following "replication" function:

def repl(row):
    return pd.DataFrame({'date': pd.date_range(start=row.date,
        periods=24, freq='H'),'store_id': row.store_id, 'x': row.x})

It "replicates" the source row (parameter), returning a sequence of rows with the given date, for consecutive hours.

Then:

  • reset the index, to have all columns as "normal" columns,
  • apply this function (to each row),
  • convert the resulting Series of DataFrames into a list (of DataFrames),
  • concatenate the result.

The code to do it is:

pd.concat(df.reset_index().apply(repl, axis=1).tolist(), ignore_index=True)

Upvotes: 1

Related Questions