How add calculated column to multiindex dataframe

Question

This has probably already been answered in several places, but I still cannot get my example to work without resorting to several lines of code.

I have a dataframe with multiindex columns, and I would like to add a calculated column to to every level-0 group.

arrays = [
    np.array(["bar", "bar", "baz", "baz", "foo", "foo"]),
    np.array(["one", "two", "one", "two", "one", "two"]),
]

df = pd.DataFrame(np.linspace(0, 4*6-1, 4*6).reshape(4,6), columns=arrays)
print(df)

  bar     baz     foo    
  one two one two one two
0   0   1   2   3   4   5
1   6   7   8   9  10  11
2  12  13  14  15  16  17
3  18  19  20  21  22  23

Specifically, I want to add a column division to bar, baz and foo which divides element-wise one and two.

The way I am doing it now is this:

# divide the numbers of the two dataframes (have to drop the first level)
idx = pd.IndexSlice
div_res = df.loc[:, idx[:, "one"]].droplevel(1, axis=1) / df.loc[
    :, idx[:, "two"]
].droplevel(1, axis=1)
div_res.columns = pd.MultiIndex.from_tuples([(y, "division") for y in div_res.columns])

# and concatenate the result again
result = (
    pd.concat([df, div_res], axis=1).sort_index(
        level=[0, 1],
        axis=1,
    )
)
print(result)

       bar                baz                foo          
  division  one  two division  one  two division  one  two
0      0.0  0.0  1.0      0.7  2.0  3.0      0.8  4.0  5.0
1      0.9  6.0  7.0      0.9  8.0  9.0      0.9 10.0 11.0
2      0.9 12.0 13.0      0.9 14.0 15.0      0.9 16.0 17.0
3      0.9 18.0 19.0      1.0 20.0 21.0      1.0 22.0 23.0

This works, but it seems to me that there should be a more efficient way to do this.

Thanks a bunch!

As a bonus, I have not figured out how to sort the second column level: one, two, division.

Shubham Sharma · Accepted Answer

Let us try reshaping with stack and unstack

s = df.stack(0)
s['div'] = s['one'].div(s['two'])
s.stack().unstack([1, 2])

    bar                   baz                   foo                
    one   two       div   one   two       div   one   two       div
0   0.0   1.0  0.000000   2.0   3.0  0.666667   4.0   5.0  0.800000
1   6.0   7.0  0.857143   8.0   9.0  0.888889  10.0  11.0  0.909091
2  12.0  13.0  0.923077  14.0  15.0  0.933333  16.0  17.0  0.941176
3  18.0  19.0  0.947368  20.0  21.0  0.952381  22.0  23.0  0.956522

How add calculated column to multiindex dataframe

Answers (2)

Related Questions