Multiply many columns by one column in dask

Question

I want to multiply roughly 50,000 columns with one other column in a large dask dataframe (6_500_000 x 50_002). The solution, using a for loop, works but is painfully slow. Below I tried two other appraoches that failed. Any advice is appreciated.

Pandas

import pandas as pd
df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6], 'c':[7,8,9]})
df[['a','b']].multiply(df['c'], axis="index")

Dask

import dask.dataframe as dd
ddf = dd.from_pandas(df, npartitions=1)

# works but very slow for large datasets: 
for column in ['a', 'b']:
    ddf[column] = ddf[column] * ddf['c']

# don't work:
ddf[['a','b']].multiply(ddf['c'], axis="index") 
ddf[['a', 'b']].map_partitions(pd.DataFrame.mul, other=ddf['c'] ).compute()

David Erickson · Accepted Answer

Use .mul for dask:

import pandas as pd
import dask.dataframe as dd
df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6], 'c':[7,8,9]})
ddf = dd.from_pandas(df, npartitions=1)

ddf[['a','b']] = ddf[['a','b']].mul(ddf['c'], axis='index') # or axis=0

ddf.compute()
Out[1]: 
    a   b  c
0   7  28  7
1  16  40  8
2  27  54  9

Multiply many columns by one column in dask

Answers (2)

Related Questions