Reputation: 12406
I am trying to store a Dask dataframe, with a categorical column, to a *.h5
file per this tutorial - 1:23:25 - 1:23:45.
Here is my call to a store
function:
stored = store(ddf,'/home/HdPC/Analyzed.h5', ['Tag'])
The function store
is:
@delayed
def store(ddf,fp,c):
ddf.categorize(columns=c).to_hdf(fp, '/data2')
and uses categorize.
ddf
and stored
are of type:
print(type(ddf), type(stored))
>>> (<class 'dask.dataframe.core.DataFrame'>, <class 'dask.delayed.Delayed'>)
When I run compute(*[stored])
or stored.compute()
, I get this:
dask.async.AttributeError: 'DataFrame' object has no attribute 'categorize'
Is there a way to achieve this categorization of the Tag
column with the store
function? Or should I use a different method to store the Dask dataframe with a categorical?
Upvotes: 1
Views: 1127
Reputation: 28683
I would suggest you try the data-frame operations without the delayed call - daak-dataframes already are lazy compute graphs internally. I believe by calling compute, you are actually passing the resultant pandas data-frame to your function, which is why you get the error.
In your case: simply remove @delayed
(remembering that to_hdf is a blocking call).
Upvotes: 2