Less CPU Cores Utilization in Dask

Question

I am using Dask DataFrame to parallelize my following regex search code.

ddf = dd.from_pandas(in_desc, npartitions=16)
def r_s(dataframe1):
    for vals in dataframe1:
        for regex in dataframe.values:
            if(re.search(regex[0], vals)):
                pass
res = ddf.map_partitions(r_s, meta=ddf)
res.compute()

in_desc and dataframe1 are two pandas dataframes.

On checking the core utilization using mpstat -P ALL 1, I noticed that out of 16 CPU cores, no core was utilizing more than 20 %. However, the sum of utilization of all the cores was approx 100 percent. Is utilization of all the cores to more than 50 percent, possible using dask? If Yes, then how should I do it or modify my code to achieve the task?

Thanks.

Less CPU Cores Utilization in Dask

Answers (1)

Related Questions