Reputation: 806
I would like a daily breakdown of a dataset by type. There are not records for every day for every type, where they don't exist I would like NaN.
I'm able to get a 'resampled to daily' result, but the type is ommitted.
The code below should be a fully working example (well, working apart from the known error at the end!):
import pandas as pd
import datetime as dt
df = pd.DataFrame({
'Date': [dt.datetime(2021,1,1), dt.datetime(2021, 1, 3), dt.datetime(2020,1,2)],
'Type': ['A', 'A', 'B'],
'Value': [1,2,3]
})
df.set_index('Date', inplace=True)
# this loses the 'type'
print(df.resample('1D').mean())
df = df.reset_index().set_index(['Date', 'Type'])
# this raises an exception "TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'"
print(df.resample('1D').mean())
Output I'm looking for is a row for every day / type combo:
date | type | value |
---|---|---|
20210101 | A | 1 |
20210102 | A | NaN |
20210103 | A | 2 |
20210101 | B | NaN |
20210102 | B | 3 |
20210103 | B | NaN |
Any advice or pointers gratefully received.
Upvotes: 1
Views: 780
Reputation: 862641
If need resample per groups is possible use Grouper
for resample per days and then for add missing values is used Series.unstack
with DataFrame.stack
:
df = (df.groupby(['Type', pd.Grouper(freq='1D', key='Date')])['Value']
.mean()
.unstack()
.stack(dropna=False)
.reset_index(name='Value')
)
print (df)
Type Date Value
0 A 2021-01-01 1.0
1 A 2021-01-02 NaN
2 A 2021-01-03 2.0
3 B 2021-01-01 NaN
4 B 2021-01-02 3.0
5 B 2021-01-03 NaN
If need only append missing datetimes per groups is used DataFrame.reindex
:
mux = pd.MultiIndex.from_product([df['Type'].unique(),
pd.date_range(df['Date'].min(), df['Date'].max())],
names=['Date','Type'])
df = df.set_index(['Type','Date']).reindex(mux).reset_index()
print (df)
Date Type Value
0 A 2021-01-01 1.0
1 A 2021-01-02 NaN
2 A 2021-01-03 2.0
3 B 2021-01-01 NaN
4 B 2021-01-02 3.0
5 B 2021-01-03 NaN
Upvotes: 1