Reputation: 21
I have run into a rather frustrating problem with OpenMP: it seems that if OpenMP is used in parallel mode somewhere in the code (for more than one thread), then dynamic memory allocation/de-allocation becomes slower even in non-parallel portions of code. Here is an example program (just an illustration):
int main()
{
#pragma omp parallel
{
// Just to get OpenMP going
}
double wtime0, wtime;
wtime0 = omp_get_wtime();
double **stuff;
const int N = 1000000;
stuff = new double*[N];
for (int i=0; i < N; i++) stuff[i] = new double;
for (int i=0; i < N; i++) *(stuff[i]) = sqrt(i);
for (int i=0; i < N; i++) delete[] stuff[i];
delete[] stuff;
wtime = omp_get_wtime() - wtime0;
cout << "Total CPU time: " << wtime << endl;
}
When I run this code with one thread on my laptop (which is an Intel Core 2 Duo), I get a CPU time of 0.093. On the other hand, if I run it with two threads, the CPU time increases to 0.13. The more pointer allocations there are, the worse the discrepancy becomes. In the above code, if I were to replace "stuff" by a simple array, e.g.
double stuff2[N];
for (int i=0; i < N; i++) stuff2[i] = sqrt(i);
then there is no discrepancy. Can someone tell me why this problem exists when pointers are allocated/de-allocated, even though it's not done in parallel? The reason why this is a problem is because in the real code I am working with, dynamic memory allocation is essential. There are sections that can be sped up by running in parallel, but (with two threads versus one) this is more than overcompensated by the fact that the memory allocation/de-allocation is slowed down considerably, even in the non-parallel sections. If someone with extensive OpenMP experience can tell me how to get around this problem I would really appreciate it. (Worst case scenario, I can just use MPI instead, but I would love it if this can be solved within OpenMP.)
Thanks in advance for the help.
Upvotes: 2
Views: 1355
Reputation: 62563
Yes, this is concievable. In general, one should avoid naive dynamic allocations in multi-threading anvironment, as there is a single lock there. MT-aware allocators provide a much better performance and should be preferred in allocation-heavy scenarios. This is exactly why I always scowl down on code here which just uses vectors or strings or shared pointers as a class members without letting users to specify allocation policy.
Upvotes: 4