Reputation: 18219
I just read this intro to parallel processing with openMP.
I tried the following simple code
#include <iostream>
#include <ctime>
#include <vector>
int main()
{
// Create an object just to allow the following loops to do something
std::vector<int> a;
a.reserve(2000);
// First single threaded loop
std::clock_t begin;
std::clock_t end;
begin = std::clock();
double elapsed_secs;
for(int n=0; n<1000000000; ++n)
{
if (n%100000000 == 0) a.push_back(n);
}
end = std::clock();
elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
std::cout << "Time for single thread loop: " << elapsed_secs << std::endl;
// Second multithreaded loop
begin = std::clock();
#pragma omp parallel for
for(int n=0; n<1000000000; ++n)
{
if (n%100000000 == 0) a.push_back(n);
}
end = std::clock();
elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
std::cout << "Time for multi thread loop: " << elapsed_secs << std::endl;
return 0;
}
which has been compiled with g++ -std=c++11 -o a a.cpp -fopenmp
which outputs
Time for single thread loop: 3.9438
Time for multi thread loop: 3.94977
Note that I have 12 cores (and no big process currently running) on my machine.
Upvotes: 1
Views: 722
Reputation: 4343
You're not measuring real time but cpu time with std::clock
. Better use std::chrono
as some other answer suggested.
Or for a quick test without changing your code, try this in a shell:
date; time ./a; date
This was the output:
jue dic 1 23:12:57 CET 2016
Time for single thread loop: 2.99741
Time for multi thread loop: 4.55788
real 0m4.184s
user 0m7.556s
sys 0m0.000s
jue dic 1 23:13:01 CET 2016
The time differs from your output. The real time it's about 4s in my pc and not 7.5s as in the output from your program.
You should read the docs about std::clock()
, specifically:
For example, if the CPU is shared by other processes, std::clock time may advance slower than wall clock. On the other hand, if the current process is multithreaded and more than one execution core is available, std::clock time may advance faster than wall clock.
Upvotes: 1
Reputation: 45424
It should also be said that what you're doing in the parallel loop is non-sensical and bound to cause run-time errors. There are two issues.
First
#pragma omp parallel for
for(int n=0; n<1000000000; ++n)
{ if(n%100000000 == 0) <some code> }
only ever does anything 10 (ten) times. The compiler may optimise the loop variable n
away and you're left with code equivalent to
#pragma omp parallel for
for(int n=0; n<10; ++n)
{ <some code> }
which only benefits from parallelism if <some code>
is very computational demanding. So, you're actually not testing anything.
Second, and more seriously,
a.push_back(n);
is not threadsafe. That is, you must not call it (potentially) synchronously from different threads. Each call to std::vector::push_back()
changes the state of the vector, i.e. its internal data, causing a race condition.
Finally, I should recommend to not use OpenMP for parallelism with C++, because it does not support/exploit C++ language features (such as templates) and is not even standardized for recent C++ standards. Instead use something like tbb, which is designed for C++.
Upvotes: 0
Reputation: 6731
std::clock
measures CPU time, not wall time (at least the gcc implementation, though I believe the MSVC implementation measures wall time). This is an excerpt from cppreference:
Returns the approximate processor time used by the process since the beginning of an implementation-defined era related to the program's execution. To convert result value to seconds divide it by
CLOCKS_PER_SEC
.Only the difference between two values returned by different calls to std::clock is meaningful, as the beginning of the
std::clock
era does not have to coincide with the start of the program.std::clock
time may advance faster or slower than the wall clock, depending on the execution resources given to the program by the operating system. For example, if the CPU is shared by other processes,std::clock
time may advance slower than wall clock. On the other hand, if the current process is multithreaded and more than one execution core is available,std::clock
time may advance faster than wall clock.
You can measure wall time with the std::chrono
facilities:
auto Begin = std::chrono::high_resolution_clock::now();
// ...
auto End = std::chrono::high_resolution_clock::now();
std::cout << "Time for xxx: " << std::chrono::duration_cast<std::chrono::milliseconds>(End - Begin).count() << std::endl;
and you will see the real speedup.
As a side note, I would say that your test is not thread safe, because push_back
needs to modify the end
position of your vector.
Upvotes: 1
Reputation: 166
I think the reason you're 2nd loop was slower is just from the overhead required for threading. The work you're doing is very minimal and tends to be fast on its own. Push_back is constant, the function just uses a pointer to the end of the container to add a new item then it updates the pointer to the 'end'. I think if you were to put more complicated code into the loop you would start to see a difference. I also noticed you never clear 'a' between loops, so you're adding an additional million items in to 'a' which might cause the 2nd loop (even if it wasn't threaded) to run slower.
I don't think you've misunderstood how to thread as much as you just overlooked the overhead (including creating, inti, context swtiches, ext) needed to do the threading.
This question seems to be rather common, but at the same time the answer to each one can be very different as many things go into threading (https://unix.stackexchange.com/questions/80424/why-using-more-threads-makes-it-slower-than-using-less-threads). In that link the answer was more broad, stating that thread speeds are very dependent on system performance such as CPU Resources, RAM resources, and network i/o resources). This link: Why is my multi-threading slower than my single threading? shows that the OP was writing to the console, which is where the problem was (according to the link, the console class handles the thread syncronization so the code in the built in class is what was making the single thread run faster).
Upvotes: 0