Reputation: 737

No significant improvement between 4 threads and 12 threads CPU

I tried to compare 2 executions of the same Dask code on my 2 computers.

First computer is equiped with Intel I5 6500 4 threads / 16GB Ram, and the other one is a MBP with Intel I7 8750H 12 threads and / 16GB Ram.

Here is the function :

%%timeit
my_df.map_partitions(
    lambda df: df.apply(
        lambda x: detect(str(x['Description'])), axis=1)).\
compute(scheduler='processes')

I was expecting the execution to be approx. 3 times faster but I got those results for 10 000 rows :

22.9 seconds for the I5 6500
21.6 seconds for the I7 8750H

Same partition size and same parameters.

How is it possible ?

Upvotes: 2

Answers (1)

mdurant

Reputation: 28683

A few notes to help you understand the situation. I note that you do not specify how the data is loaded or what detect is.

i7 8750H has six cores, not 12. I has 12 "hyperthreads", meaning that a core can rapidly swap between threads, but you do not get 12x the processing power of one core. i5 6500 has 4 cores. You should find out how many workers Dask is creating.
you compute the whole result, which means copying the partitions back to the main client thread and concatenating them. There is no parallelism in this.
maybe you are, similarly, loading all the data in the client, and have to split and copy it to workers before starting to process.
you should try with the distributed scheduler, where you can specify the mix of processes and threads, and get much better feedback on what is happening via the dashboard. Sometimes, this scheduler also performs better, since it has better internal organisation and optimisation for the graph and where to run tasks.

With the distributed scheduler, you can time just the processing part

client = distributed.Client(...)
p_df = my_df.persist()
distributed.wait(p_df)

# time this bit
out = p_df.map_partitions(
lambda df: df.apply(
    lambda x: detect(str(x['Description'])), axis=1)).persist()
distributed.wait(out)

Upvotes: 3

No significant improvement between 4 threads and 12 threads CPU

Answers (1)

Related Questions