Mophotla
Mophotla

Reputation: 127

How to shift multi-index column without repeating subcolumn names?

Let's say I have a dataframe with a multi-index column:

P = pd.DataFrame(
    [[100, 101],
     [101, 102],
     [ 98,  99]],
    columns=pd.MultiIndex.from_tuples(
        [('price', 'bid'),
         ('price', 'ask')]
    )
)
    
P

dataframe

and I want to add a new column which shows me the data from the previous row:

P['price_prev'] = P['price'].shift(1)

This throws the error

ValueError: Cannot set a DataFrame with multiple columns to the single column price_prev

I understand why this happens, and doing

P[[('price_prev', 'bid'), ('price_prev', 'ask')]] = P['price'].shift(1)

gives me what I want without errors:

enter image description here

But is there really no way to do this which avoids repeating the names of the subcolumns? I.e., telling pandas to copy the respective column including all of its subcolumns, renaming the top level to whatever was specified, and then shifting all of the data one row down?

Upvotes: 1

Views: 74

Answers (2)

Mophotla
Mophotla

Reputation: 127

@ziying35's answer does work, but only if I want to shift my entire dataframe.

Here's a similar and slightly less verbose version that also works for individual columns (in this case price):

P = P.join(P[['price']].shift(), rsuffix='_prev')

The one drawback of this compared to the explicit

P[[('price_prev', 'bid'), ('price_prev', 'ask')]] = P['price'].shift()

is a higher memory usage, so there seems to be a memory leak somewhere when using join. However, this might also just be my Jupyter Notebook acting up.

Upvotes: 0

ziying35
ziying35

Reputation: 1305

try this:

P.join(P.shift().rename(lambda x: f'{x}_prev', axis=1, level=0))

Upvotes: 2

Related Questions