c++multithreadingc++11parallel-processing

Reputation: 424

Why can't I get any performance improvements by running multiple threads in C++11?

I have the following test program with a simple function that finds primes which I am trying to run in multiple threads (just as an example).

#include <cstdio>
#include <iostream>
#include <ctime>
#include <thread>

void primefinder(void)
{
   int n = 300000;

   int i, j;
   int lastprime = 0;
   for(i = 2; i <= n; i++) {
      for(j = 2; j <= i; j++) {
           if((i % j) == 0) {
               if(i == j)
                   lastprime = i;
               else {
                   break;
               }
           }
      }
   }

   std::cout << "Prime: " << lastprime << std::endl;
}

int main(void)
{
   std::clock_t start;
   start = std::clock();

   std::thread t1(primefinder);
   t1.join();

   std::cout << "Time: " << (std::clock() - start) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;

   start = std::clock();

   std::thread t2(primefinder);
   std::thread t3(primefinder);
   t2.join();
   t3.join();

   std::cout << "Time: " << (std::clock() - start) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;
   return 0;
}

As shown, I run the function once in 1 thread and then once in 2 different threads. I compile it with g++ using -O3 and -pthread. I am running it on Linux Mint 18. I have a Core i5-4670. I know it comes down to the OS but I would very much expect these threads to run in somewhat parallel. When I run the program, top shows 100% CPU when using 1 thread and 200% CPU when using 2 threads. Despite this the second run takes almost exactly twice as long.

The CPU is doing nothing else while running the program. Why doesn't this get executed in parallel ?

Edit: I know both threads are doing the exact same thing. I chose the primerfinder function simply as an example of something embarrassingly parallel so when I run it in multiple threads it should take just as long in real time.

Upvotes: 4

Answers (4)

E.Wijmans

Reputation: 104

Using std::clock to time a parallel program in c++ is very deceptive. There are two types of time that we care about when timing a program: wall time and cpu time. Wall time is what we are all used to (think clock on a wall). Cpu time is essentially how many cpu cycles your program used. std::clock returns cpu time (this is why you are dividing by CLOCKS_PER_SEC) and cpu time is only equal to wall time when there is one thread of execution. If a program can be run 100% in parallel (like your's), then cpu time = (number of threads)*(wall time). So seeing almost exactly twice as long is exactly what you would expect.

For measuring wall time (which is what you want to do), I don't know of a way to do that using the STL. You can measure it using OpenMP or Boost.

omp_get_wtime()

Boost Timer

Since you are on linux, the version of g++ that you are using more than likely has openmp support built in.

#include <omp.h>

const double startTime = omp_get_wtime();
..... //Work goes here

const double time = omp_get_wtime() - startTime;

You will have to compile with -fopenmp

EDIT:

As johnbakers pointed out, the chrono library does have a wall clock

#include <chrono>

auto start = std::chrono::system_clock::now();
.... //Do some work

auto end = std::chrono::system_clock::now();
std::chrono::duration<double> diff = end - start;
std::cout << "Time: " << diff.count() << "(s)" << std::end;

Output of that vs. boost timer:

Boost: 121.685972s wall, 724.940000s user + 67.660000s system = 792.600000s CPU  (651.3%)
Chrono: 121.683(s)

Upvotes: 8

Chris Beck

Reputation: 16214

There's a pretty basic problem in your design, which explains why you don't see any benefits from threads.

When you have a search problem like this and you want to speed it up with parallelization, the idea is usually to use divide and conquer. You need to somehow divy up the work so that e.g. the first thread will do the first half of the work, and the second thread will do the second half of the work.

In your code, both threads call exactly the same function and they don't communicate -- they each duplicate the other guy's work!

In some problems it's really easy to divy up the work. For instance, if you were working on a SAT solver, one way to divide the work between two threads would be to choose a variable x_1, and say, the first thread will assume x_1 = true and check if the formula is satisfiable, and the second thread will assume x_1 = false and check if the formula is satisfiable. Then when they are both joined you will know if overall the formula is satisfiable, and there are no mutexes or interthread communication needed.

In your primes problem it's a bit more complicated. You could try to do something like, the first thread only considers candidates ending in 1, 3, 5 and the second thread considers candidates ending in 7, 9. For best performance though, you probably want to use something like "Eratosthenes' Sieve" and I think that would be a little harder to parallelize. (You could maybe use an array of atomics?)

Upvotes: 2

johnbakers

Reputation: 24771

Take a look at the std::chrono namespace. It offers many C++11 utilities for measuring time that are superior to every suggestion here thus far.

For example, you can cast a clock reading to wall time.

Timing programming code is not trivial, but these tools do make it easier to explore along these lines.

Upvotes: 1

Thomas Matthews

Reputation: 57749

There are many possible reasons why the code is not in parallel.

In days of old, operating systems would run pieces of executables in a round-robin or priority method. So one thread of execution could run for a couple of milliseconds, then swapped out with another thread of execution. This would give the appearance that threads of execution are run in parallel.

The swapping out of threads of execution would also occur when one thread waits on a resource and the resource is not available.

In modern computers, with multiple processors or cores, the technique is still the same. The OS has another processor it can delegate tasks to. Core time is precious. The OS is unlikely to stop all the running tasks on multiple cores so your threads can execute in parallel. This probably means that the OS would have to wait for one processor to finish so your threads can be executed at the same time. Most likely ain't going to happen.

However, many OS have attributes that you can set up to tell them to give your threads exclusive access to one or more cores. Since this is not a standard C++ functionality and OSes are not all the same, you'll have to look up the API or any compiler specific support.

Edit 1: Interrupts and other tasks
Keep in mind that your platform is not running your programs exclusively. Other tasks are lurking and may be executing while your program is running. Some examples include: virus checkers, things pinging the internet, and music players (at least on my machine).

These applications and interrupts may cause the OS to play round-robin with your threads on the same processor and not in parallel. Once scenario would be to have the music player on one processor why your program is running on another.

Upvotes: 1

Why can&#39;t I get any performance improvements by running multiple threads in C++11?

Answers (4)

Related Questions

Why can't I get any performance improvements by running multiple threads in C++11?