Reputation: 981
I'd like to speed up my DataFrame manipulations and have decided to use for this aim the dask library - but cannot use it with success. I have made a test example to show my problems:
import numpy as np
import pandas as pd
import dask.dataframe as dd
from dask.multiprocessing import get
def testfunc(good):
return good*good
df = pd.DataFrame({'a' : [1,2,3], 'b' : [4,5,6], 'c' : [7,8,9]})
ddata = dd.from_pandas(df, npartitions=2)
df1 = ddata.map_partitions(lambda df: df.apply((lambda row: testfunc(*row)), axis=1)).compute(get=get)
But running this code I receive an error: TypeError: testfunc() takes 1 positional argument but 3 were given. Could you explain what is wrong in my code...
Upvotes: 0
Views: 156
Reputation: 4214
This will work with a minor change. You're currently unpacking the row
object by using the asterisk. You probably want to directly pass the row, as is.
import numpy as np
import pandas as pd
import dask.dataframe as dd
def testfunc(good):
return good*good
df = pd.DataFrame({'a' : [1,2,3], 'b' : [4,5,6], 'c' : [7,8,9]})
ddata = dd.from_pandas(df, npartitions=2)
df1 = ddata.map_partitions(lambda df: df.apply((lambda row: testfunc(row)), axis=1)).compute()
print(df1)
a b c
0 1 16 49
1 4 25 64
2 9 36 81
For more information, you might want to check out the expression Python docs
Upvotes: 1