Omair Siddique
Omair Siddique

Reputation: 1

Unable to achieve multithreading in WSL2

I'm trying to execute the following C++ multi-threaded program on WSL2:

#include <iostream>
#include <pthread.h>
#define N 1024*1024*1024
#define num_threads 8
using namespace std;

long x[num_threads] = {0};

// thread function prototype
void* threadnfunc (void *);
void func (int);

int main() {
    cout << "With threads" << endl;
    pthread_t my_threads[num_threads];
    int arg_to_be_passed_to_func[num_threads];      // idx of array x

    // create threads
    for(int i=0; i<num_threads; i++){
        arg_to_be_passed_to_func[i] = i;
        pthread_create (&my_threads[i], NULL, threadnfunc, &arg_to_be_passed_to_func[i]);
    }
    
    // start threads
    for(int i=0; i<num_threads; i++){
        pthread_join (my_threads[i], NULL);
    }

    // cout << "Without threads" << endl;
    // for(int i=0; i<num_threads; i++){
    //     func(i);
    // }

    return 0;
}

void* threadnfunc (void *arg) {
    int j = *(int*)arg;
    cout << "Starting thread " << j << endl;
    for(int i=0; i<N; i++) x[j]+=1;
    cout << "Thread " << j << " execution completed!" << endl;
    return NULL;
}

void func(int j) {
    cout << "Starting thread " << j << endl;
    for(int i=0; i<N; i++) x[j]+=1;
    cout << "Thread " << j << " execution completed!" << endl;
}

I expect this program to run significantly faster than a single-threaded version because it distributes the workload across multiple threads. On a native Ubuntu VM, I observe a clear speedup. However, on WSL2, the execution time remains nearly the same for both the multi-threaded and single-threaded versions.


System Info

Processor: Intel core i7-10750
Processor Base Clock: 2.60GHz
RAM: 16GB
Cores: 6
Logical Processors: 12


Running nproc inside WSL gives 12 which I believe means WSL has access to all logical processors (I have specified processors=12 in .wslconfig file). Furthermore, virtualization is also enabled, as well as hyper-threading is enabled (from the BIOS menu). Cores utilization in task manager shows that all CPUs (0-11) are being 30-35% utilized. So this might be a CPU throttling issue, however, cat /proc/cpuinfo | grep MHz shows 2592MHz for each CPU, which is pretty close to the base clock of the processor, so I'm not sure whether it's a throttling issue or not. Other than that, I couldn't understand much from the info displayed on running lscpu, here's the displayed info, might be helpful ig:

Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          39 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   12
  On-line CPU(s) list:    0-11
Vendor ID:                GenuineIntel
  Model name:             Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
    CPU family:           6
    Model:                165
    Thread(s) per core:   2
    Core(s) per socket:   6
    Socket(s):            1
    Stepping:             2
    BogoMIPS:             5184.01
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse s
                          se2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid
                           pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f1
                          6c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanc
                          ed tpr_shadow vnmi ept vpid ept_ad fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap c
                          lflushopt xsaveopt xsavec xgetbv1 xsaves md_clear flush_l1d arch_capabilities
Virtualization features:
  Virtualization:         VT-x
  Hypervisor vendor:      Microsoft
  Virtualization type:    full
Caches (sum of all):
  L1d:                    192 KiB (6 instances)
  L1i:                    192 KiB (6 instances)
  L2:                     1.5 MiB (6 instances)
  L3:                     12 MiB (1 instance)
Vulnerabilities:
  Gather data sampling:   Unknown: Dependent on hypervisor status
  Itlb multihit:          KVM: Mitigation: VMX disabled
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
  Reg file data sampling: Not affected
  Retbleed:               Mitigation; Enhanced IBRS
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence;
                           BHI SW loop, KVM SW loop
  Srbds:                  Unknown: Dependent on hypervisor status
  Tsx async abort:        Not affected

How do I fix this issue where WSL2 does not exhibit any performance gain from multi-threading compared to the single-threaded version, despite having access to all logical processors?

Upvotes: 0

Views: 59

Answers (0)

Related Questions