Reputation: 43099
The below test case runs out of memory on 32 bit machines (throwing std::bad_alloc) in the loop following the "post MT section" message when OpenMP is used, however, if the #pragmas for OpenMP are commented out, the code runs through to completion fine, so it appears that when the memory is allocated in parallel threads, it does not free correctly and thus we run out of memory.
Question is whether there is something wrong with the memory allocation and deletion code below or is this a bug in gcc v4.2.2 or OpenMP? I also tried gcc v4.3 and got same failure.
int main(int argc, char** argv)
{
std::cout << "start " << std::endl;
{
std::vector<std::vector<int*> > nts(100);
#pragma omp parallel
{
#pragma omp for
for(int begin = 0; begin < int(nts.size()); ++begin) {
for(int i = 0; i < 1000000; ++i) {
nts[begin].push_back(new int(5));
}
}
}
std::cout << " pre delete " << std::endl;
for(int begin = 0; begin < int(nts.size()); ++begin) {
for(int j = 0; j < nts[begin].size(); ++j) {
delete nts[begin][j];
}
}
}
std::cout << "post MT section" << std::endl;
{
std::vector<std::vector<int*> > nts(100);
int begin, i;
try {
for(begin = 0; begin < int(nts.size()); ++begin) {
for(i = 0; i < 2000000; ++i) {
nts[begin].push_back(new int(5));
}
}
} catch (std::bad_alloc &e) {
std::cout << e.what() << std::endl;
std::cout << "begin: " << begin << " i: " << i << std::endl;
throw;
}
std::cout << "pre delete 1" << std::endl;
for(int begin = 0; begin < int(nts.size()); ++begin) {
for(int j = 0; j < nts[begin].size(); ++j) {
delete nts[begin][j];
}
}
}
std::cout << "end of prog" << std::endl;
char c;
std::cin >> c;
return 0;
}
Upvotes: 6
Views: 7322
Reputation: 43099
I found this issue elsewhere seen without OpenMP but just using pthreads. The extra memory consumption when multi-threaded appears to be typical behavior for the standard memory allocator. By switching to the Hoard allocator the extra memory consumption goes away.
Upvotes: 3
Reputation: 11130
Changing the first OpenMP loop from 1000000 to 2000000 will cause the same error. This indicates that the out of memory problem is with OpenMP stack limit.
Try setting the OpenMP stack limit to unlimit in bash with
ulimit -s unlimited
You can also change the OpenMP environment variable OMP_STACKSIZE and setting it to 100MB or more.
UPDATE 1: I change the first loop to
{
std::vector<std::vector<int*> > nts(100);
#pragma omp for schedule(static) ordered
for(int begin = 0; begin < int(nts.size()); ++begin) {
for(int i = 0; i < 2000000; ++i) {
nts[begin].push_back(new int(5));
}
}
std::cout << " pre delete " << std::endl;
for(int begin = 0; begin < int(nts.size()); ++begin) {
for(int j = 0; j < nts[begin].size(); ++j) {
delete nts[begin][j]
}
}
}
Then, I get a memory error at i=1574803 on the Main thread.
UPDATE 2: If you are using the Intel compiler, you can add the following to the top of your code and it will solve the problem (providing you have enough memory for the extra overhead).
std::cout << "Previous stack size " << kmp_get_stacksize_s() << std::endl;
kmp_set_stacksize_s(1000000000);
std::cout << "Now stack size " << kmp_get_stacksize_s() << std::endl;
UPDATE 3: For completeness, like mentioned by another member, if you are performing some numerical computation, it is best to preallocate everything in a single new float[1000000] instead of using OpenMP to do 1000000 allocations. This applies to allocating objects as well.
Upvotes: 3
Reputation: 54138
Why are you using int*
as the inner vector member? That's very wasteful - you have 4 bytes (sizeof(int)
, strictly) of data and 2-3 times more again of heap control structure for every vector
entry. Try this just using vector<int>
and see if it runs better.
I'm not an OpenMP expert but this usage seems weird in its asymmetry - you fill the vectors in parallel section and clear them in non-parallel code. Cannot tell you whether that's wrong, but it 'feels' wrong.
Upvotes: 0