Pandas: Select from date in MultiIndex

Question

Say I have MultiIndex series as

date        foo
2006-01-01  1         12931926.310
            3         11084049.460
            5         10812205.359
            7          9031510.239
            9          5324054.903
2007-01-01  1         11086082.624
            3         12028419.560
            5         11957253.031
            7         10643307.061
            9          6034854.915

If it weren't a MultiIndex, I could select those with year 2007 through df.loc['2007']. How do I do this here? My natural guess was df.loc['2007', :], but that gave my an empty Series([], name: FINLWT21, dtype: float64).

Ultimate Goal

Ultimatively, I am also interested in replacing all rows for different dates than 2007 with the rows of the year 2007.

That is, my expected output is

date        foo
2006-01-01  1         11086082.624
            3         12028419.560
            5         11957253.031
            7         10643307.061
            9          6034854.915
2007-01-01  1         11086082.624
            3         12028419.560
            5         11957253.031
            7         10643307.061
            9          6034854.915

I tried to implement @unutbu's solution, but

mySeries.loc[dateIndex.year != 2007] = mySeries.loc[dateIndex.year == 2007]

will naturally set the values (due to non-existence on the RHS) to NaN. Usually, these issues are fixed by

mySeries.loc[dateIndex.year != 2007] = mySeries.loc[dateIndex.year == 2007].values

, but given that I have 10 values (and more in my real dataset) on the lefthand side, but only 5 on the right, I get

ValueError: cannot set using a list-like indexer with a different length than the value

The only alternative that comes to my mind now is to iterate over the first index and then use the previous command for each subgroup, but that doesn't appear to be the most efficient solution.

unutbu · Accepted Answer

Given the Series

In [207]: series
Out[212]: 
date        foo
2006-01-01  1      12931926.310
            3      11084049.460
            5      10812205.359
            7       9031510.239
            9       5324054.903
2007-01-01  1      11086082.624
            3      12028419.560
            5      11957253.031
            7      10643307.061
            9       6034854.915
Name: val, dtype: float64

you could extract the date index with

dateindex = series.index.get_level_values('date')
# Ensure the dateindex is a DatetimeIndex (as opposed to a plain Index)
dateindex = pd.DatetimeIndex(dateindex)

Now selection of the rows where the year equals 2007 can be done with a boolean condition:

# select rows where year equals 2007
series2007 = series.loc[dateindex.year == 2007]

If the foo values cycle through the same values in the same order for each date, then you could replace all the values in the series with those from 2007 with

N = len(series)/len(series2007)
series[:] = np.tile(series.loc[dateindex.year == 2007].values, N)

An advantage of using np.tile and .values is that it will generate the desired array of values relatively quickly. A (possible) disadvantage is that this is ignoring the index so it relies on the assumption that the foo values cycle through the same values in the same order for each date.

The more robust (but slower) way is to use a join:

df = series.reset_index('date')
df2007 = df.loc[dateindex.year==2007]
df = df.join(df2007, rsuffix='_2007')
df = df[['date', 'val_2007']]
df = df.set_index(['date'], append=True)
df = df.swaplevel(0,1).sort_index()

yields

In [304]: df.swaplevel(0,1).sort_index()
Out[304]: 
                    val_2007
date       foo              
2006-01-01 1    11086082.624
           3    12028419.560
           5    11957253.031
           7    10643307.061
           9     6034854.915
2007-01-01 1    11086082.624
           3    12028419.560
           5    11957253.031
           7    10643307.061
           9     6034854.915
2008-01-01 1    11086082.624
           3    12028419.560
           5    11957253.031
           7    10643307.061
           9     6034854.915

Pandas: Select from date in MultiIndex

Ultimate Goal

Answers (2)

Related Questions