Tom
Tom

Reputation: 31

Multiply columns in Dataframe where columns are pd.MultiIndex

I want to multiply 2 columns (A*B) in a DataFrame where columns are pd.MultiIndex. I want to perform this multiplication for each DataX (Data1, Data2, ...) column in columns level=0.

df = pd.DataFrame(data= np.arange(32).reshape(8,4), 
                  columns = pd.MultiIndex.from_product(iterables = [["Data1","Data2"],["A","B"]]))

    Data1   Data2
    A   B   A   B
0   0   1   2   3
1   4   5   6   7
2   8   9   10  11
3   12  13  14  15
4   16  17  18  19
5   20  21  22  23
6   24  25  26  27
7   28  29  30  31

The result of multiplication should be also a DataFrame with columns=pd.MultiIndex (see below).

    Data1   Data2   Data1   Data2
    A   B   A   B   A*B     A*B
0   0   1   2   3   0       6
1   4   5   6   7   20      42
2   8   9   10  11  72      110
3   12  13  14  15  156     210
4   16  17  18  19  272     342
5   20  21  22  23  420     506
6   24  25  26  27  600     702
7   28  29  30  31  812     930

I managed to perform this multiplication by iterating over columns, level=0,but looking a better way to do it.

for _ in df.columns.get_level_values(level=0).unique().tolist()[:]:
    df[(_, "A*B")] = df[(_, "A")] * df[(_, "B")]

Any suggestions or hints much appreciated! Thanks

Upvotes: 3

Views: 113

Answers (4)

sammywemmy
sammywemmy

Reputation: 28679

Another alternative here, using prod :

df[("Data1", "A*B")] = df.loc(axis=1)["Data1"].prod(axis=1)
df[("Data2", "A*B")] = df.loc(axis=1)["Data2"].prod(axis=1)

df

   Data1    Data2 Data1 Data2
   A    B   A   B  A*B  A*B
0   0   1   2   3   0   6
1   4   5   6   7   20  42
2   8   9   10  11  72  110
3   12  13  14  15  156 210
4   16  17  18  19  272 342
5   20  21  22  23  420 506
6   24  25  26  27  600 702
7   28  29  30  31  812 930

Upvotes: 1

ALollz
ALollz

Reputation: 59549

Slice out the 'A' and 'B' along the first level of the columns Index. Then you can multiply which will align on the 0th level ('Data1', 'Data2'). We'll then re-create the MultiIndex on the columns and join back

df1 = df.xs('A', axis=1, level=1).multiply(df.xs('B', axis=1, level=1))
df1.columns = pd.MultiIndex.from_product([df1.columns, ['A*B']])

df = pd.concat([df, df1], axis=1)

Here are some timings assuming you have 2 groups (Data1, Data2) and your DataFrame just gets longer. Turns out, the simple loop might be the fastest of them all. (I added some sorting and needed to copy them all so the output is the same).

import perfplot
import pandas as pd
import numpy as np

#@Tom
def simple_loop(df):
    for _ in df.columns.get_level_values(level=0).unique().tolist()[:]:
        df[(_, "A*B")] = df[(_, "A")] * df[(_, "B")]
    return df.sort_index(axis=1)

#@Roy2012
def mul_with_stack(df):
    df = df.stack(level=0)
    df["A*B"] = df.A * df.B
    return df.stack().swaplevel().unstack(level=[2,1]).sort_index(axis=1)

#@Alollz
def xs_concat(df):
    df1 = df.xs('A', axis=1, level=1).multiply(df.xs('B', axis=1, level=1))
    df1.columns = pd.MultiIndex.from_product([df1.columns, ['A*B']])

    return pd.concat([df, df1], axis=1).sort_index(axis=1)

#@anky
def prod_join(df):
    u = df.prod(axis=1,level=0)
    u.columns=pd.MultiIndex.from_product((u.columns,['*'.join(df.columns.levels[1])]))
    return df.join(u).sort_index(axis=1)


perfplot.show(
    setup=lambda n: pd.DataFrame(data=np.arange(4*n).reshape(n, 4), 
                                 columns =pd.MultiIndex.from_product(iterables=[["Data1", "Data2"], ["A", "B"]])), 
    kernels=[
        lambda df: simple_loop(df.copy()),
        lambda df: mul_with_stack(df.copy()),
        lambda df: xs_concat(df.copy()),
        lambda df: prod_join(df.copy())
    ],
    labels=['simple_loop', 'stack_and_multiply', 'xs_concat', 'prod_join'],
    n_range=[2 ** k for k in range(3, 20)],
    equality_check=np.allclose, 
    xlabel="len(df)"
)

enter image description here

Upvotes: 4

Roy2012
Roy2012

Reputation: 12503

Here's a way to do it with stack and unstack. The advantage: fully vectorized, no loops, no join operations.

t = df.stack(level=0)
t["A*B"] = t.A * t.B
t = t.stack().swaplevel().unstack(level=[2,1])

The output is:

  Data1          Data2         
      A   B  A*B     A   B  A*B
0     0   1    0     2   3    6
1     4   5   20     6   7   42
2     8   9   72    10  11  110
3    12  13  156    14  15  210
4    16  17  272    18  19  342

Upvotes: 2

anky
anky

Reputation: 75080

Here is another alternative using df.prod and df.join

u = df.prod(axis=1,level=0)
u.columns=pd.MultiIndex.from_product((u.columns,['*'.join(df.columns.levels[1])]))
out = df.join(u)

  Data1     Data2     Data1 Data2
      A   B     A   B   A*B   A*B
0     0   1     2   3     0     6
1     4   5     6   7    20    42
2     8   9    10  11    72   110
3    12  13    14  15   156   210
4    16  17    18  19   272   342
5    20  21    22  23   420   506
6    24  25    26  27   600   702
7    28  29    30  31   812   930

Upvotes: 4

Related Questions