Reputation: 625
I'd like to extend a datetime index that's within a multi-index. However, passing level
to reindex
isn't working. For example, here's a series with a multi-index. I'd like to reindex the date level (a datetime index), to extend it by one month.
import numpy as np
import pandas as pd
category_idx = pd.Index(['A', 'B'])
date_idx = pd.date_range('2018-01', '2018-02', freq='MS')
idx = pd.MultiIndex.from_product([category_idx, date_idx], names=['category', 'date'])
series = pd.Series(np.random.randn(len(category_idx) * len(date_idx)), index=idx)
series
# category date
# A 2018-01-01 1.052776
# 2018-02-01 -0.032686
# B 2018-01-01 1.745934
# 2018-02-01 -0.759375
# dtype: float64
Here's the new date level, extended by one month.
new_date_idx = date_idx.union([date_idx[-1] + date_idx.freq])
new_date_idx
# DatetimeIndex(['2018-01-01', '2018-02-01', '2018-03-01'], dtype='datetime64[ns]', freq='MS')
I'd expect the following to show that the series has two new rows, each containing NaN
. However, nothing's changed.
series.reindex(index=new_date_idx, level='date')
# category date
# A 2018-01-01 1.052776
# 2018-02-01 -0.032686
# B 2018-01-01 1.745934
# 2018-02-01 -0.759375
# dtype: float64
I expected the behaviour to be the same as reindexing an index.
# series.loc['A'].reindex(index=new_date_idx)
# 2018-01-01 1.052776
# 2018-02-01 -0.032686
# 2018-03-01 NaN
# Freq: MS, dtype: float64
Upvotes: 4
Views: 2542
Reputation: 862406
It looks like bug, also new value is in new MultiIndex
, only not added codes:
s = series.reindex(index=new_date_idx, level='date')
print (s.index)
MultiIndex(levels=[['A', 'B'], [2018-01-01 00:00:00,
2018-02-01 00:00:00,
2018-03-01 00:00:00]],
codes=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=['category', 'date'])
Possible solution is reindex
by MultiIndex
:
mux = pd.MultiIndex.from_product([series.index.levels[0], new_date_idx],
names=series.index.names)
s = series.reindex(mux)
print (s)
category date
A 2018-01-01 0.125677
2018-02-01 0.623794
2018-03-01 NaN
B 2018-01-01 0.175913
2018-02-01 0.711070
2018-03-01 NaN
dtype: float64
print (s.index)
MultiIndex(levels=[['A', 'B'], [2018-01-01 00:00:00,
2018-02-01 00:00:00,
2018-03-01 00:00:00]],
codes=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]],
names=['category', 'date'])
Or unstack
, reindex
and stack
:
s = series.unstack().reindex(columns=new_date_idx).stack(dropna=False)
Upvotes: 2