Iain Dillingham
Iain Dillingham

Reputation: 625

Extending a datetime index within a multi-index

I'd like to extend a datetime index that's within a multi-index. However, passing level to reindex isn't working. For example, here's a series with a multi-index. I'd like to reindex the date level (a datetime index), to extend it by one month.

import numpy as np
import pandas as pd

category_idx = pd.Index(['A', 'B'])
date_idx = pd.date_range('2018-01', '2018-02', freq='MS')
idx = pd.MultiIndex.from_product([category_idx, date_idx], names=['category', 'date'])

series = pd.Series(np.random.randn(len(category_idx) * len(date_idx)), index=idx)
series
# category  date      
# A         2018-01-01    1.052776
#           2018-02-01   -0.032686
# B         2018-01-01    1.745934
#           2018-02-01   -0.759375
# dtype: float64

Here's the new date level, extended by one month.

new_date_idx = date_idx.union([date_idx[-1] + date_idx.freq])
new_date_idx
# DatetimeIndex(['2018-01-01', '2018-02-01', '2018-03-01'], dtype='datetime64[ns]', freq='MS')

I'd expect the following to show that the series has two new rows, each containing NaN. However, nothing's changed.

series.reindex(index=new_date_idx, level='date')
# category  date      
# A         2018-01-01    1.052776
#           2018-02-01   -0.032686
# B         2018-01-01    1.745934
#           2018-02-01   -0.759375
# dtype: float64

I expected the behaviour to be the same as reindexing an index.

# series.loc['A'].reindex(index=new_date_idx)
# 2018-01-01    1.052776
# 2018-02-01   -0.032686
# 2018-03-01         NaN
# Freq: MS, dtype: float64


Update: I've raised this question as an issue with Pandas: https://github.com/pandas-dev/pandas/issues/25460.

Upvotes: 4

Views: 2542

Answers (1)

jezrael
jezrael

Reputation: 862406

It looks like bug, also new value is in new MultiIndex, only not added codes:

s  = series.reindex(index=new_date_idx, level='date')
print (s.index)
MultiIndex(levels=[['A', 'B'], [2018-01-01 00:00:00, 
                                2018-02-01 00:00:00, 
                                2018-03-01 00:00:00]],
           codes=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=['category', 'date'])

Possible solution is reindex by MultiIndex:

mux = pd.MultiIndex.from_product([series.index.levels[0], new_date_idx], 
                                 names=series.index.names)
s  = series.reindex(mux)
print (s)
category  date      
A         2018-01-01    0.125677
          2018-02-01    0.623794
          2018-03-01         NaN
B         2018-01-01    0.175913
          2018-02-01    0.711070
          2018-03-01         NaN
dtype: float64

print (s.index)

MultiIndex(levels=[['A', 'B'], [2018-01-01 00:00:00, 
                                2018-02-01 00:00:00, 
                                2018-03-01 00:00:00]],
           codes=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]],
           names=['category', 'date'])

Or unstack, reindex and stack:

s  = series.unstack().reindex(columns=new_date_idx).stack(dropna=False)

Upvotes: 2

Related Questions