Roman Kazmin
Roman Kazmin

Reputation: 981

Dask: applying custom function to DataFrame gets error

I'd like to speed up my DataFrame manipulations and have decided to use for this aim the dask library - but cannot use it with success. I have made a test example to show my problems:

import numpy as np
import pandas as pd
import dask.dataframe as dd
from dask.multiprocessing import get

def testfunc(good):
  return good*good

df = pd.DataFrame({'a' : [1,2,3], 'b' : [4,5,6], 'c' : [7,8,9]})
ddata = dd.from_pandas(df, npartitions=2)

df1 = ddata.map_partitions(lambda df: df.apply((lambda row: testfunc(*row)), axis=1)).compute(get=get)

But running this code I receive an error: TypeError: testfunc() takes 1 positional argument but 3 were given. Could you explain what is wrong in my code...

Upvotes: 0

Views: 156

Answers (1)

Nick Becker
Nick Becker

Reputation: 4214

This will work with a minor change. You're currently unpacking the row object by using the asterisk. You probably want to directly pass the row, as is.

import numpy as np
import pandas as pd
import dask.dataframe as dd
​
def testfunc(good):
    return good*good
​
df = pd.DataFrame({'a' : [1,2,3], 'b' : [4,5,6], 'c' : [7,8,9]})
ddata = dd.from_pandas(df, npartitions=2)
​
df1 = ddata.map_partitions(lambda df: df.apply((lambda row: testfunc(row)), axis=1)).compute()
print(df1)
   a   b   c
0  1  16  49
1  4  25  64
2  9  36  81

For more information, you might want to check out the expression Python docs

Upvotes: 1

Related Questions