Cedric
Cedric

Reputation: 79

Python: Multiplying a dataframe with another dataframe

Hi I currently have 2 dataframe with different shapes

df11 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
                   columns=['a', 'b', 'c'])
    a   b   c
0   1   2   3
1   4   5   6
2   7   8   9

df12 = pd.DataFrame(np.array([[7, 8, 9]]),
                   columns=['a', 'b', 'c'])

a   b   c
0   7   8   9

I would like to multiply each row in df11 by df12. So the resulting dataframe should show

df13 = pd.DataFrame(np.array([[7, 16, 27], [28, 40, 54], [49, 64, 81]]),
                   columns=['a', 'b', 'c'])

    a   b   c
0   7   16  27
1   28  40  54
2   49  64  81

Upvotes: 1

Views: 964

Answers (3)

OTheDev
OTheDev

Reputation: 2967

One-liner

df_3 = df_1 * df_2.iloc[0]

Code

import pandas as pd

data_1 = {'a': [1, 4, 7],
          'b': [2, 5, 8],
          'c': [3, 6, 9]}
data_2 = {'a': [7], 'b': [8], 'c': [9]}
df_1 = pd.DataFrame(data_1)
df_2 = pd.DataFrame(data_2)

df_3 = df_1 * df_2.iloc[0]
print(df_3)

Output

    a   b   c
0   7  16  27
1  28  40  54
2  49  64  81

Timings A few timings for this input.

# Paul_O's numpy approach
25.9 µs ± 440 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# iloc approach
172 µs ± 962 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# mozway's approach 
194 µs ± 254 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# Paul_O's mul approach
308 µs ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Making data_1 a 10000 x 3 DataFrame of random integers between 1 and 10000 we get very similar results.

# Paul_O's numpy approach
39 µs ± 396 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# iloc approach
188 µs ± 1.94 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# mozway's approach
206 µs ± 2.86 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

# Paul_O's mul approach
312 µs ± 1.95 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Of course, these are only two sets of timings for two very specific sets of input on one system so I would not advise on generating hard conclusions from these but it seems if your problem is very similar to this one then the numpy approach is best. The best way may differ in other circumstances, e.g., if the form of your input differs.

Upvotes: 2

mozway
mozway

Reputation: 260790

You can use squeeze:

df13 = df11*df12.squeeze()

The potential advantage is that it would perform a 2D multiplication if df12 has more than 2 rows.

output:

    a   b   c
0   7  16  27
1  28  40  54
2  49  64  81

Upvotes: 1

Paul_0
Paul_0

Reputation: 358

I recommend using numpy multiplication

df13 = pd.DataFrame(df11.to_numpy()*df12.to_numpy(), columns=df11.columns)

Or you can use pandas mul operator like this,

df11.mul({'a': 7, 'b': 8, 'c': 9})

Upvotes: 2

Related Questions