Zakery Clarke
Zakery Clarke

Reputation: 55

Speed of Promisified Web Workers

I have the need to compute a mathematically intensive function many (>10,000) times. I thought I'd use web workers in order to alleviate the computational time.

I am using a function that creates a blob of a function, wraps its execution in a promise, and runs the web worker at the blob. I tested it and it works, but runs significantly slower than the single threaded approach.

SingleThreaded: 3 milliseconds MultiThreaded: 5524 milliseconds

Complete code including time test:

https://codepen.io/zakerytclarke/pen/BgRyBm?editors=0012

This code calculates the first n squares and pushes them to an array. The console shows the respective times for running single threaded and multithreaded.

This is the function that I am using to promisify web workers. Is there something wrong with it that is causing the execution time to be so much more than a simple for loop?

function thread(fn){
  return function(args){
    return new Promise(function(resolve) {

        var worker=new Worker(URL.createObjectURL(new Blob(['('+fn+')('+JSON.stringify(args)+')'])));
        worker.postMessage(args)
        worker.onmessage = function(event){
            resolve(event.data);
            worker.terminate();
        };
    });
  }
 }

Thanks for your help.

Here is the Info on my CPU in case that matters:

uinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 76
model name  : Intel(R) Atom(TM) x7-Z8700  CPU @ 1.60GHz
stepping    : 3
microcode   : 0x367
cpu MHz     : 901.401
cache size  : 1024 KB
physical id : 0
siblings    : 4
core id     : 0
cpu cores   : 4
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch epb pti ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms dtherm ida arat
bugs        : cpu_meltdown spectre_v1 spectre_v2
bogomips    : 3200.00
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor  : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 76
model name  : Intel(R) Atom(TM) x7-Z8700  CPU @ 1.60GHz
stepping    : 3
microcode   : 0x367
cpu MHz     : 875.272
cache size  : 1024 KB
physical id : 0
siblings    : 4
core id     : 1
cpu cores   : 4
apicid      : 2
initial apicid  : 2
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch epb pti ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms dtherm ida arat
bugs        : cpu_meltdown spectre_v1 spectre_v2
bogomips    : 3200.00
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

 processor : 2
vendor_id   : GenuineIntel
cpu family  : 6
model       : 76
model name  : Intel(R) Atom(TM) x7-Z8700  CPU @ 1.60GHz
stepping    : 3
microcode   : 0x367
cpu MHz     : 860.525
cache size  : 1024 KB
physical id : 0
siblings    : 4
core id     : 2
cpu cores   : 4
apicid      : 4
initial apicid  : 4
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch epb pti ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms dtherm ida arat
bugs        : cpu_meltdown spectre_v1 spectre_v2
bogomips    : 3200.00
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor  : 3
vendor_id   : GenuineIntel
cpu family  : 6
model       : 76
model name  : Intel(R) Atom(TM) x7-Z8700  CPU @ 1.60GHz
stepping    : 3
microcode   : 0x367
cpu MHz     : 557.593
cache size  : 1024 KB
physical id : 0
siblings    : 4
core id     : 3
cpu cores   : 4
apicid      : 6
initial apicid  : 6
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch epb pti ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms dtherm ida arat
bugs        : cpu_meltdown spectre_v1 spectre_v2
bogomips    : 3200.00
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:```

Upvotes: 1

Views: 448

Answers (1)

Bergi
Bergi

Reputation: 664425

You are ignoring the severe overhead of web workers. No wonder your code that creates a blob by stringifying a function, creates a worker from that file, parses the code of the worker, instantiates a promise, sends a message, sets up a listener, and waits asynchronously for the result from the worker, is a few thousand times slower than doing a single multiplication of two doubles. In fact I am very surprised that you even manage to spawn 10000 workers in only 5 seconds.

This benchmark estimates setting up a worker to be around 40ms. So no, workers are not lightweight threads that you should spawn for anything. They are meant as worker threads, and you should send (many) messages to them to be processed and responded to. You might want to create a worker pool to spread the load over multiple threads.

Of course, for your square function none of this applies. It's too small - it does only a single multiplication. Doing it on the main thread will be faster than anything that communicates with another thread. If you had a 100000-items loop in that function, it might become worth it to run be run in a background thread. The 3ms single-threaded performance that you are achieved doesn't even noticably block the main thread.

Upvotes: 1

Related Questions