John J
John J

Reputation: 33

Multi-threaded code not working as expected

As I am very new to the concept of Multi-Threading, I'm trying to keep things very simple. The complete program I have written below in Visual Studio 2019 was an attempt to compare the performance of a single thread vs 3 threads running simultaneously. Each thread has its own data to work on so I have assumed there is no need for data protection from the other threads.

The algorithm simply creates the first 64 numbers of the Fibonacci sequence, and is recreated millions of times. The algorithm doesn't really have a practical use. It was just created to waste CPU time.

On a modern day laptop running at 2 GHz, a single thread takes about 7.6 seconds to complete. Unfortunately with 3 threads, most times it will take around 7 seconds to complete. This is marginally better than a single thread! However occasionally 3 threads will take about 2.54 seconds to complete. This last time is a little over 1/3 the time of a single thread and is what I expected to occur most of the time.

I'm not sure what's going on here. But it does appear to be a stall of some sort or a synchronisation issue. Any help appreciated.

#include <stdio.h>
#include <iostream>
#include <thread>
#include <chrono>

#define ITERATIONS 10000000

using namespace std;
using namespace std::chrono;

void WasteTime(__int64 fib[], int start, int end)
{
    int i;
    
    for (i = start; i < end; i++)
    {
        // Initialise the sequence
        fib[0] = 1;
        fib[1] = 1;
        
        // Create Fibonacci sequence
        for (int f1 = 0, f2 = 1, f3 = 2; f1 < 62; f1++, f2++, f3++)
        {
            fib[f3] = fib[f1] + fib[f2];
        }

        // Display iteration count
        if (i % 0x1000000 == 0)
            printf("%10u\n", i);
    }
    // Display final iteration count for thread
    printf("%10u\n", i);
}

void WasteTimeUsingThreads(__int64 fib1[], __int64 fib2[], __int64 fib3[])
{
    thread t1(WasteTime, fib1,               0, 10 * ITERATIONS);
    thread t2(WasteTime, fib2, 10 * ITERATIONS, 20 * ITERATIONS);
    thread t3(WasteTime, fib3, 20 * ITERATIONS, 30 * ITERATIONS);

    // Before the main thread continues running, wait for all threads to finish.
    t1.join();
    t2.join();
    t3.join();
}

int main()
{
    __int64 fib1[64], fib2[64], fib3[64];

    auto start = steady_clock::now();
    WasteTime(fib1, 0, 30 * ITERATIONS);
    auto end = steady_clock::now();
    cout << "Time for normal way is: " << duration_cast<milliseconds>(end - start).count() << endl;

    start = steady_clock::now();
    WasteTimeUsingThreads(fib1, fib2, fib3);
    end = steady_clock::now();
    cout << "Time for multithreaded way is: " << duration_cast<milliseconds>(end - start).count() << endl;
    
    return 0;
}

Upvotes: 0

Views: 807

Answers (1)

First, your fib1, fib2, fib3 arrays are small, and probably share the same CPU cache line on a modern processor (e.g. some Intel or AMD in 64 bits mode, designed after 2015). So you could have contentions. Read about cache coherency protocols.

Did you try with arrays larger than the page size (usually 4Kbytes at least on many processors)?

Be sure to enable optimizations in your compiler. With GCC, compile with g++ -O2 at least.

And you could use profiling utilities, like Linux oprofile(1)

On my Linux Ubuntu 21 desktop (AMD Ryzen Threadripper 2970WX), I use (with <stdint.h> header) #define ITERATIONS 20000000 and int64_t fib1[4096], fib2[4096], fib3[4096]; as local variables, and compiled with /usr/bin/g++-10 -std=c++11 -O2 your john.cc file. Output has:

Time for normal way is: 47255
Time for multithreaded way is: 16074

I tried also to compile with /usr/bin/g++-11 -std=c++11 -O3 john.cc -o john -lstdc++ -lpthread and the performance are even better:

Time for normal way is: 11714
Time for multithreaded way is: 3989

Same code with

int64_t fib1[128], fib2[128], fib3[128];

compiled using /usr/bin/g++-11 -std=c++11 -O1 john.cc -o john -lstdc++ -lpthread gives

Time for normal way is: 51973
Time for multithreaded way is: 17678

which is consistent with my intuitions.

Upvotes: 1

Related Questions