DASK Dataframe within loops

Question

I'm having some trouble trying to implement loops in Dask. for example in the following code:

for i in range(len(col)):
    if df[col[i]].dtype=='object':
        pass
    elif df[col[i]].std().compute()==0:
        cols_constant.append(col[i])
df = df.drop(cols_constant,axis=1)

The same code is very quick using pandas but on dask it is taking a considerable amount of time to complete the task.

I understand Dask is inefficient over loops. But how can I optimize my code for Dask for functions similar to the one above?

I cannot use e.persist() since we intend to do the computation on multiple worker systems.

Will it be useful to use the function 'dask.do' to parallelize the same task?

DASK Dataframe within loops

Answers (1)

Related Questions