Reputation: 2678
For a MultiIndex
with a repeating level, how can I calculate the differences with another level of the index, effectively ignoring it?
Let me explain in code.
>>> ix = pd.MultiIndex.from_product([(0, 1, 2), (0, 1, 2, 3)])
>>> df = pd.DataFrame([5]*4 + [4]*4 + [3, 2, 1, 0], index=ix)
>>> df
0
0 0 5
1 5
2 5
3 5
1 0 4
1 4
2 4
3 4
2 0 3
1 2
2 1
3 0
Now by some operation I'd like to subtract the last set of values (2, 0:4)
from the whole data frame. I.e. df - df.loc[2]
to produce this:
0
0 0 2
1 3
2 4
3 5
1 0 1
1 2
2 3
3 4
2 0 0
1 0
2 0
3 0
But the statement produces an error. df - df.loc[2:3]
does not, but in addition to the trailing zeros only NaNs are produced - naturally of course because the indices don't match.
How could this be achieved?
I realised that the index level is precisely the problem. So I got a bit closer.
>>> df.droplevel(0) - df.loc[2]
0
0 2
0 1
0 0
1 3
1 2
1 0
2 4
2 3
2 0
3 5
3 4
3 0
Still not quite what I want. But I don't know if there's a convenient way of achieving what I'm after.
Upvotes: 2
Views: 79
Reputation: 150745
This with stack
and unstack
:
new_df = df.unstack()
new_df.sub(new_df.loc[2]).stack()
Output:
0
0 0 2
1 3
2 4
3 5
1 0 1
1 2
2 3
3 4
2 0 0
1 0
2 0
3 0
Upvotes: 2
Reputation: 75080
Try creating a dataframe with identical index and mapping the last set of data with the first level and populate across the dataframe , then substract:
df - pd.DataFrame(index=df.index,data=df.index.get_level_values(1).map(df.loc[2].squeeze()))
0
0 0 2
1 3
2 4
3 5
1 0 1
1 2
2 3
3 4
2 0 0
1 0
2 0
3 0
Upvotes: 1