Reputation: 55
I have the need to compute a mathematically intensive function many (>10,000) times. I thought I'd use web workers in order to alleviate the computational time.
I am using a function that creates a blob of a function, wraps its execution in a promise, and runs the web worker at the blob. I tested it and it works, but runs significantly slower than the single threaded approach.
SingleThreaded: 3 milliseconds MultiThreaded: 5524 milliseconds
Complete code including time test:
https://codepen.io/zakerytclarke/pen/BgRyBm?editors=0012
This code calculates the first n squares and pushes them to an array. The console shows the respective times for running single threaded and multithreaded.
This is the function that I am using to promisify web workers. Is there something wrong with it that is causing the execution time to be so much more than a simple for loop?
function thread(fn){
return function(args){
return new Promise(function(resolve) {
var worker=new Worker(URL.createObjectURL(new Blob(['('+fn+')('+JSON.stringify(args)+')'])));
worker.postMessage(args)
worker.onmessage = function(event){
resolve(event.data);
worker.terminate();
};
});
}
}
Thanks for your help.
Here is the Info on my CPU in case that matters:
uinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 76 model name : Intel(R) Atom(TM) x7-Z8700 CPU @ 1.60GHz stepping : 3 microcode : 0x367 cpu MHz : 901.401 cache size : 1024 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch epb pti ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms dtherm ida arat bugs : cpu_meltdown spectre_v1 spectre_v2 bogomips : 3200.00 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 76 model name : Intel(R) Atom(TM) x7-Z8700 CPU @ 1.60GHz stepping : 3 microcode : 0x367 cpu MHz : 875.272 cache size : 1024 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 4 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch epb pti ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms dtherm ida arat bugs : cpu_meltdown spectre_v1 spectre_v2 bogomips : 3200.00 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 76 model name : Intel(R) Atom(TM) x7-Z8700 CPU @ 1.60GHz stepping : 3 microcode : 0x367 cpu MHz : 860.525 cache size : 1024 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 4 apicid : 4 initial apicid : 4 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch epb pti ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms dtherm ida arat bugs : cpu_meltdown spectre_v1 spectre_v2 bogomips : 3200.00 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 76 model name : Intel(R) Atom(TM) x7-Z8700 CPU @ 1.60GHz stepping : 3 microcode : 0x367 cpu MHz : 557.593 cache size : 1024 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch epb pti ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms dtherm ida arat bugs : cpu_meltdown spectre_v1 spectre_v2 bogomips : 3200.00 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:```
Upvotes: 1
Views: 448
Reputation: 664425
You are ignoring the severe overhead of web workers. No wonder your code that creates a blob by stringifying a function, creates a worker from that file, parses the code of the worker, instantiates a promise, sends a message, sets up a listener, and waits asynchronously for the result from the worker, is a few thousand times slower than doing a single multiplication of two doubles. In fact I am very surprised that you even manage to spawn 10000 workers in only 5 seconds.
This benchmark estimates setting up a worker to be around 40ms. So no, workers are not lightweight threads that you should spawn for anything. They are meant as worker threads, and you should send (many) messages to them to be processed and responded to. You might want to create a worker pool to spread the load over multiple threads.
Of course, for your square
function none of this applies. It's too small - it does only a single multiplication. Doing it on the main thread will be faster than anything that communicates with another thread. If you had a 100000-items loop in that function, it might become worth it to run be run in a background thread. The 3ms single-threaded performance that you are achieved doesn't even noticably block the main thread.
Upvotes: 1