Equal time execution for running with 4 and 8 threads

Question

I test some code using OpenMP. Here it is:

#include 
#include 
#include 

#define NUM_THREADS 8
#define ARR_SIZE 10000

class A {
private: 
    int a[ARR_SIZE];
public:
    A() {
        for (int i = 0; i < ARR_SIZE; i++)
            a[i] = i;
    }
// <<-----------MAIN CODE HERE--------------->
    void fn(A &o1, A &o2) {
        int some = 0;
        #pragma omp parallel num_threads(NUM_THREADS)
        {
            #pragma omp for reduction(+:some)
            for (int i = 0; i < ARR_SIZE; i++) {
                for (int j = 0; j < ARR_SIZE; j++)
                    some += o1.a[i] * o2.a[j];
            }
        }
        std::cout << some < elapsed = end - start;
    std::cout << elapsed.count();
}

Execution time:

1 thread : 0.233663 sec
2 threads : 0.12449 sec
4 threads : 0.0665889 sec
8 threads : 0.0643735 sec

As you see, there is almost no difference between 4 and 8 threads execution. What can be a reason of a such behavior? Also it would be nice, if you try this code on your machine ;).

P.S. My processor:

Model:               Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz 
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1

zneak · Accepted Answer

You have 4 physical cores. The promise of hyperthreading is that each core can "think about" two tasks, and will dynamically between the two when it gets blocked on one (for instance, if it needs to wait for a memory operation to finish). In theory, this means that the time wasted waiting for some operations to complete is reduced. However, in practice, actual performance gains tend to be nowhere close to the 2x improvement that you'd get by doubling the number of cores. The improvement is typically between 0 and 0.3x, and sometimes it even causes slowdowns.

4 threads is essentially the useful thread upper bound for the computer that you are using. A computer with 8 physical cores might get the speedup that you expect.

Equal time execution for running with 4 and 8 threads

Answers (1)

Related Questions