Reputation: 1
I'm trying to execute the following C++ multi-threaded program on WSL2:
#include <iostream>
#include <pthread.h>
#define N 1024*1024*1024
#define num_threads 8
using namespace std;
long x[num_threads] = {0};
// thread function prototype
void* threadnfunc (void *);
void func (int);
int main() {
cout << "With threads" << endl;
pthread_t my_threads[num_threads];
int arg_to_be_passed_to_func[num_threads]; // idx of array x
// create threads
for(int i=0; i<num_threads; i++){
arg_to_be_passed_to_func[i] = i;
pthread_create (&my_threads[i], NULL, threadnfunc, &arg_to_be_passed_to_func[i]);
}
// start threads
for(int i=0; i<num_threads; i++){
pthread_join (my_threads[i], NULL);
}
// cout << "Without threads" << endl;
// for(int i=0; i<num_threads; i++){
// func(i);
// }
return 0;
}
void* threadnfunc (void *arg) {
int j = *(int*)arg;
cout << "Starting thread " << j << endl;
for(int i=0; i<N; i++) x[j]+=1;
cout << "Thread " << j << " execution completed!" << endl;
return NULL;
}
void func(int j) {
cout << "Starting thread " << j << endl;
for(int i=0; i<N; i++) x[j]+=1;
cout << "Thread " << j << " execution completed!" << endl;
}
I expect this program to run significantly faster than a single-threaded version because it distributes the workload across multiple threads. On a native Ubuntu VM, I observe a clear speedup. However, on WSL2, the execution time remains nearly the same for both the multi-threaded and single-threaded versions.
System Info
Processor: Intel core i7-10750
Processor Base Clock: 2.60GHz
RAM: 16GB
Cores: 6
Logical Processors: 12
Running nproc
inside WSL gives 12
which I believe means WSL has access to all logical processors (I have specified processors=12
in .wslconfig
file). Furthermore, virtualization is also enabled, as well as hyper-threading is enabled (from the BIOS menu). Cores utilization in task manager shows that all CPUs (0-11) are being 30-35% utilized. So this might be a CPU throttling issue, however, cat /proc/cpuinfo | grep MHz
shows 2592MHz for each CPU, which is pretty close to the base clock of the processor, so I'm not sure whether it's a throttling issue or not. Other than that, I couldn't understand much from the info displayed on running lscpu
, here's the displayed info, might be helpful ig:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
CPU family: 6
Model: 165
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
Stepping: 2
BogoMIPS: 5184.01
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse s
se2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid
pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f1
6c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanc
ed tpr_shadow vnmi ept vpid ept_ad fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap c
lflushopt xsaveopt xsavec xgetbv1 xsaves md_clear flush_l1d arch_capabilities
Virtualization features:
Virtualization: VT-x
Hypervisor vendor: Microsoft
Virtualization type: full
Caches (sum of all):
L1d: 192 KiB (6 instances)
L1i: 192 KiB (6 instances)
L2: 1.5 MiB (6 instances)
L3: 12 MiB (1 instance)
Vulnerabilities:
Gather data sampling: Unknown: Dependent on hypervisor status
Itlb multihit: KVM: Mitigation: VMX disabled
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Reg file data sampling: Not affected
Retbleed: Mitigation; Enhanced IBRS
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence;
BHI SW loop, KVM SW loop
Srbds: Unknown: Dependent on hypervisor status
Tsx async abort: Not affected
How do I fix this issue where WSL2 does not exhibit any performance gain from multi-threading compared to the single-threaded version, despite having access to all logical processors?
Upvotes: 0
Views: 59