Felix
Felix

Reputation: 2678

Difference of one multi index level

For a MultiIndex with a repeating level, how can I calculate the differences with another level of the index, effectively ignoring it?

Let me explain in code.

>>> ix = pd.MultiIndex.from_product([(0, 1, 2), (0, 1, 2, 3)])
>>> df = pd.DataFrame([5]*4 + [4]*4 + [3, 2, 1, 0], index=ix)
>>> df
     0
0 0  5
  1  5
  2  5
  3  5
1 0  4
  1  4
  2  4
  3  4
2 0  3
  1  2
  2  1
  3  0

Now by some operation I'd like to subtract the last set of values (2, 0:4) from the whole data frame. I.e. df - df.loc[2] to produce this:

     0
0 0  2
  1  3
  2  4
  3  5
1 0  1
  1  2
  2  3
  3  4
2 0  0
  1  0
  2  0
  3  0

But the statement produces an error. df - df.loc[2:3] does not, but in addition to the trailing zeros only NaNs are produced - naturally of course because the indices don't match.

How could this be achieved?


I realised that the index level is precisely the problem. So I got a bit closer.

>>> df.droplevel(0) - df.loc[2]
   0
0  2
0  1
0  0
1  3
1  2
1  0
2  4
2  3
2  0
3  5
3  4
3  0

Still not quite what I want. But I don't know if there's a convenient way of achieving what I'm after.

Upvotes: 2

Views: 79

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150745

This with stack and unstack:

new_df = df.unstack()
new_df.sub(new_df.loc[2]).stack()

Output:

     0
0 0  2
  1  3
  2  4
  3  5
1 0  1
  1  2
  2  3
  3  4
2 0  0
  1  0
  2  0
  3  0

Upvotes: 2

anky
anky

Reputation: 75080

Try creating a dataframe with identical index and mapping the last set of data with the first level and populate across the dataframe , then substract:

df - pd.DataFrame(index=df.index,data=df.index.get_level_values(1).map(df.loc[2].squeeze()))

     0
0 0  2
  1  3
  2  4
  3  5
1 0  1
  1  2
  2  3
  3  4
2 0  0
  1  0
  2  0
  3  0

Upvotes: 1

Related Questions