Reputation: 2300
This has probably already been answered in several places, but I still cannot get my example to work without resorting to several lines of code.
I have a dataframe with multiindex columns, and I would like to add a calculated column to to every level-0 group.
arrays = [
np.array(["bar", "bar", "baz", "baz", "foo", "foo"]),
np.array(["one", "two", "one", "two", "one", "two"]),
]
df = pd.DataFrame(np.linspace(0, 4*6-1, 4*6).reshape(4,6), columns=arrays)
print(df)
bar baz foo
one two one two one two
0 0 1 2 3 4 5
1 6 7 8 9 10 11
2 12 13 14 15 16 17
3 18 19 20 21 22 23
Specifically, I want to add a column division
to bar
, baz
and foo
which divides element-wise one
and two
.
The way I am doing it now is this:
# divide the numbers of the two dataframes (have to drop the first level)
idx = pd.IndexSlice
div_res = df.loc[:, idx[:, "one"]].droplevel(1, axis=1) / df.loc[
:, idx[:, "two"]
].droplevel(1, axis=1)
div_res.columns = pd.MultiIndex.from_tuples([(y, "division") for y in div_res.columns])
# and concatenate the result again
result = (
pd.concat([df, div_res], axis=1).sort_index(
level=[0, 1],
axis=1,
)
)
print(result)
bar baz foo
division one two division one two division one two
0 0.0 0.0 1.0 0.7 2.0 3.0 0.8 4.0 5.0
1 0.9 6.0 7.0 0.9 8.0 9.0 0.9 10.0 11.0
2 0.9 12.0 13.0 0.9 14.0 15.0 0.9 16.0 17.0
3 0.9 18.0 19.0 1.0 20.0 21.0 1.0 22.0 23.0
This works, but it seems to me that there should be a more efficient way to do this.
Thanks a bunch!
As a bonus, I have not figured out how to sort the second column level: one
, two
, division
.
Upvotes: 1
Views: 625
Reputation: 71689
Let us try reshaping with stack
and unstack
s = df.stack(0)
s['div'] = s['one'].div(s['two'])
s.stack().unstack([1, 2])
bar baz foo
one two div one two div one two div
0 0.0 1.0 0.000000 2.0 3.0 0.666667 4.0 5.0 0.800000
1 6.0 7.0 0.857143 8.0 9.0 0.888889 10.0 11.0 0.909091
2 12.0 13.0 0.923077 14.0 15.0 0.933333 16.0 17.0 0.941176
3 18.0 19.0 0.947368 20.0 21.0 0.952381 22.0 23.0 0.956522
Upvotes: 2
Reputation: 261580
You can groupby
on the columns axis and apply your transform:
(df.groupby(level=0, axis=1) # groupby first level of columns
.apply(lambda d: (d.xs('one', level=1, axis=1) # select "one"
.div(d.xs('two', level=1, axis=1)) # divide by "two"
).rename(columns=lambda x: 'division')) # col name
.join(df) # join with original data
.sort_index(level=0, axis=1)
)
output:
bar baz foo
division one two division one two division one two
0 0.000000 0.0 1.0 0.666667 2.0 3.0 0.800000 4.0 5.0
1 0.857143 6.0 7.0 0.888889 8.0 9.0 0.909091 10.0 11.0
2 0.923077 12.0 13.0 0.933333 14.0 15.0 0.941176 16.0 17.0
3 0.947368 18.0 19.0 0.952381 20.0 21.0 0.956522 22.0 23.0
Upvotes: 2