Reputation: 31
I want to multiply 2 columns (A*B) in a DataFrame where columns are pd.MultiIndex. I want to perform this multiplication for each DataX (Data1, Data2, ...) column in columns level=0.
df = pd.DataFrame(data= np.arange(32).reshape(8,4),
columns = pd.MultiIndex.from_product(iterables = [["Data1","Data2"],["A","B"]]))
Data1 Data2
A B A B
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19
5 20 21 22 23
6 24 25 26 27
7 28 29 30 31
The result of multiplication should be also a DataFrame with columns=pd.MultiIndex (see below).
Data1 Data2 Data1 Data2
A B A B A*B A*B
0 0 1 2 3 0 6
1 4 5 6 7 20 42
2 8 9 10 11 72 110
3 12 13 14 15 156 210
4 16 17 18 19 272 342
5 20 21 22 23 420 506
6 24 25 26 27 600 702
7 28 29 30 31 812 930
I managed to perform this multiplication by iterating over columns, level=0,but looking a better way to do it.
for _ in df.columns.get_level_values(level=0).unique().tolist()[:]:
df[(_, "A*B")] = df[(_, "A")] * df[(_, "B")]
Any suggestions or hints much appreciated! Thanks
Upvotes: 3
Views: 113
Reputation: 28679
Another alternative here, using prod :
df[("Data1", "A*B")] = df.loc(axis=1)["Data1"].prod(axis=1)
df[("Data2", "A*B")] = df.loc(axis=1)["Data2"].prod(axis=1)
df
Data1 Data2 Data1 Data2
A B A B A*B A*B
0 0 1 2 3 0 6
1 4 5 6 7 20 42
2 8 9 10 11 72 110
3 12 13 14 15 156 210
4 16 17 18 19 272 342
5 20 21 22 23 420 506
6 24 25 26 27 600 702
7 28 29 30 31 812 930
Upvotes: 1
Reputation: 59549
Slice out the 'A' and 'B' along the first level of the columns Index. Then you can multiply which will align on the 0th level ('Data1', 'Data2'). We'll then re-create the MultiIndex on the columns and join back
df1 = df.xs('A', axis=1, level=1).multiply(df.xs('B', axis=1, level=1))
df1.columns = pd.MultiIndex.from_product([df1.columns, ['A*B']])
df = pd.concat([df, df1], axis=1)
Here are some timings assuming you have 2 groups (Data1, Data2) and your DataFrame just gets longer. Turns out, the simple loop might be the fastest of them all. (I added some sorting and needed to copy them all so the output is the same).
import perfplot
import pandas as pd
import numpy as np
#@Tom
def simple_loop(df):
for _ in df.columns.get_level_values(level=0).unique().tolist()[:]:
df[(_, "A*B")] = df[(_, "A")] * df[(_, "B")]
return df.sort_index(axis=1)
#@Roy2012
def mul_with_stack(df):
df = df.stack(level=0)
df["A*B"] = df.A * df.B
return df.stack().swaplevel().unstack(level=[2,1]).sort_index(axis=1)
#@Alollz
def xs_concat(df):
df1 = df.xs('A', axis=1, level=1).multiply(df.xs('B', axis=1, level=1))
df1.columns = pd.MultiIndex.from_product([df1.columns, ['A*B']])
return pd.concat([df, df1], axis=1).sort_index(axis=1)
#@anky
def prod_join(df):
u = df.prod(axis=1,level=0)
u.columns=pd.MultiIndex.from_product((u.columns,['*'.join(df.columns.levels[1])]))
return df.join(u).sort_index(axis=1)
perfplot.show(
setup=lambda n: pd.DataFrame(data=np.arange(4*n).reshape(n, 4),
columns =pd.MultiIndex.from_product(iterables=[["Data1", "Data2"], ["A", "B"]])),
kernels=[
lambda df: simple_loop(df.copy()),
lambda df: mul_with_stack(df.copy()),
lambda df: xs_concat(df.copy()),
lambda df: prod_join(df.copy())
],
labels=['simple_loop', 'stack_and_multiply', 'xs_concat', 'prod_join'],
n_range=[2 ** k for k in range(3, 20)],
equality_check=np.allclose,
xlabel="len(df)"
)
Upvotes: 4
Reputation: 12503
Here's a way to do it with stack
and unstack
. The advantage: fully vectorized, no loops, no join operations.
t = df.stack(level=0)
t["A*B"] = t.A * t.B
t = t.stack().swaplevel().unstack(level=[2,1])
The output is:
Data1 Data2
A B A*B A B A*B
0 0 1 0 2 3 6
1 4 5 20 6 7 42
2 8 9 72 10 11 110
3 12 13 156 14 15 210
4 16 17 272 18 19 342
Upvotes: 2
Reputation: 75080
Here is another alternative using df.prod
and df.join
u = df.prod(axis=1,level=0)
u.columns=pd.MultiIndex.from_product((u.columns,['*'.join(df.columns.levels[1])]))
out = df.join(u)
Data1 Data2 Data1 Data2
A B A B A*B A*B
0 0 1 2 3 0 6
1 4 5 6 7 20 42
2 8 9 10 11 72 110
3 12 13 14 15 156 210
4 16 17 18 19 272 342
5 20 21 22 23 420 506
6 24 25 26 27 600 702
7 28 29 30 31 812 930
Upvotes: 4