Reputation: 127
Let's say I have a dataframe with a multi-index column:
P = pd.DataFrame(
[[100, 101],
[101, 102],
[ 98, 99]],
columns=pd.MultiIndex.from_tuples(
[('price', 'bid'),
('price', 'ask')]
)
)
P
and I want to add a new column which shows me the data from the previous row:
P['price_prev'] = P['price'].shift(1)
This throws the error
ValueError: Cannot set a DataFrame with multiple columns to the single column price_prev
I understand why this happens, and doing
P[[('price_prev', 'bid'), ('price_prev', 'ask')]] = P['price'].shift(1)
gives me what I want without errors:
But is there really no way to do this which avoids repeating the names of the subcolumns? I.e., telling pandas to copy the respective column including all of its subcolumns, renaming the top level to whatever was specified, and then shifting all of the data one row down?
Upvotes: 1
Views: 74
Reputation: 127
@ziying35's answer does work, but only if I want to shift my entire dataframe.
Here's a similar and slightly less verbose version that also works for individual columns (in this case price
):
P = P.join(P[['price']].shift(), rsuffix='_prev')
The one drawback of this compared to the explicit
P[[('price_prev', 'bid'), ('price_prev', 'ask')]] = P['price'].shift()
is a higher memory usage, so there seems to be a memory leak somewhere when using join
. However, this might also just be my Jupyter Notebook acting up.
Upvotes: 0
Reputation: 1305
try this:
P.join(P.shift().rename(lambda x: f'{x}_prev', axis=1, level=0))
Upvotes: 2