bevanj
bevanj

Reputation: 254

Issue with reindexing a multiindex

I am struggling to reindex a multiindex. Example code below:

rng = pd.date_range('01/01/2000 00:00', '31/12/2004 23:00', freq='H')
ts = pd.Series([h.dayofyear for h in rng], index=rng)
daygrouped = ts.groupby(lambda x: x.dayofyear)
daymean = daygrouped.mean()
myindex = np.arange(1,367)
myindex = np.concatenate((myindex[183:],myindex[:183]))
daymean.reindex(myindex)

gives (as expected):

184    184
185    185
186    186
187    187
...
180    180
181    181
182    182
183    183
Length: 366, dtype: int64

BUT if I create a multindex:

hourgrouped = ts.groupby([lambda x: x.dayofyear, lambda x: x.hour])
hourmean = hourgrouped.mean()
myindex = np.arange(1,367)
myindex = np.concatenate((myindex[183:],myindex[:183]))
hourmean.reindex(myindex, level=1)

I get:

1  1     1
   2     1
   3     1
   4     1
...
366  20    366
     21    366
     22    366
     23    366
Length: 8418, dtype: int64

Any ideas on my mistake? - Thanks.

Bevan

Upvotes: 1

Views: 73

Answers (1)

joris
joris

Reputation: 139142

First, you have to specify level=0 instead of 1 (as it is the first level -> zero-based indexing -> 0).
But, there is still a problem: the reindexing works, but does not seem to preserve the order of the provided index in the case of a MultiIndex:

In [54]: hourmean.reindex([5,4], level=0)
Out[54]:
4  0     4
   1     4
   2     4
   3     4
   4     4
   ...
   20    4
   21    4
   22    4
   23    4
5  0     5
   1     5
   2     5
   3     5
   4     5
   ...
   20    5
   21    5
   22    5
   23    5
dtype: int64

So getting a new subset of the index works, but it is in the same order as the original and not as the new provided index.
This is possibly a bug with reindex on a certain level (I opened an issue to discuss this: https://github.com/pydata/pandas/issues/8241)


A solution for now to reindex your series, is to create a MultiIndex and reindex with that (so not on a specified level, but with the full index, that does preserve the order). Doing this is very easy with MultiIndex.from_product as you already have myindex:

In [79]: myindex2 = pd.MultiIndex.from_product([myindex, range(24)])

In [82]: hourmean.reindex(myindex2)
Out[82]:
184  0     184
     1     184
     2     184
     3     184
     4     184
     5     184
     6     184
     7     184
     8     184
     9     184
     10    184
     11    184
     12    184
     13    184
     14    184
...
183  9     183
     10    183
     11    183
     12    183
     13    183
     14    183
     15    183
     16    183
     17    183
     18    183
     19    183
     20    183
     21    183
     22    183
     23    183
Length: 8784, dtype: int64

Upvotes: 1

Related Questions