anon_swe
anon_swe

Reputation: 9335

Dask: DataFrame taking forever to compute

I created a Dask dataframe from a Pandas dataframe that is ~50K rows and 5 columns:

ddf = dd.from_pandas(df, npartitions=32)

I then add a bunch of columns (~30) to the dataframe and try to turn it back into a Pandas dataframe:

DATA = ddf.compute(get = dask.multiprocessing.get)

I looked at the docs and if I don't specify num_workers, it defaults to using all my cores. I'm on a 64 core EC2 instance and the above line has taken minutes already without finishing...

Any idea how to speed up or what I'm doing incorrectly?

Thanks!

Upvotes: 4

Views: 757

Answers (1)

msarafzadeh
msarafzadeh

Reputation: 395

I'd suggest to try lowering the amount of threads and increasing the amount of processes to help speed up things.

Upvotes: 2

Related Questions