Reputation: 325
I have a Thrust code which loads a big array of data (2.4G) into memory, perform calculations which results are stored in host (~1.5G), then frees the initital data, load the results into device, perform other calculations on it, and finally reloads the initial data. The thrust code looks like this:
thrust::host_device<float> hostData;
// here is a code which loads ~2.4G of data into hostData
thrust::device_vector<float> deviceData = hostData;
thrust::host_vector<float> hostResult;
// here is a code which perform calculations on deviceData and copies the result to hostResult (~1.5G)
free<thrust::device_vector<float> >(deviceData);
thrust::device_vector<float> deviceResult = hostResult;
// here is code which performs calculations on deviceResult and store some results also on the device
free<thrust::device_vector<float> >(deviceResult);
deviceData = hostData;
With my defined function free:
template<class T> void free(T &V) {
V.clear();
V.shrink_to_fit();
size_t mem_tot;
size_t mem_free;
cudaMemGetInfo(&mem_free, &mem_tot);
std::cout << "Free memory : " << mem_free << std::endl;
}
template void free<thrust::device_vector<int> >(thrust::device_vector<int>& V);
template void free<thrust::device_vector<float> >(
thrust::device_vector<float>& V);
However, I get a "thrust::system::detail::bad_alloc' what(): std::bad_alloc: out of memory" error when trying to copy hostData back to deviceData even though cudaMemGetInfo returns that at this point I have ~6G of free memory of my device. Here is the complete output from the free method:
Free memory : 6295650304
Free memory : 6063775744
terminate called after throwing an instance of 'thrust::system::detail::bad_alloc'
what(): std::bad_alloc: out of memory
It seems to indicate that the device is out of memory although there is plenty free. Is it the right way to free memory for Thrust vectors? I should also note that the code works well for a smaller size of data (up to 1.5G)
Upvotes: 0
Views: 6380
Reputation: 564
I'm providing this answer as I came across this question when searching for answers to the same error message / problem.
Robert Crovella's excellent answer is certainly correct, however, it may be useful for others to know that when creating/requesting a device_vector
the capacity of the device_vector
allocated is far greater than the size of the device_vector
requested.
This answer : Understanding Thrust (CUDA) memory usage, explains in much better detail why Thrust behaves in this way.
In my case, on Ubuntu 16.04, Quadro K1200, CUDA toolkit 8.0, requesting a device_vector
of size 67108864 (doubles) resulted in a device_vector
with a capacity 8x larger (536870912) being allocated.
Requested (R) | Capacity (C) | Total Mem | Free Mem | C/Free | R/C
67108864 | 536870912 | 4238540800 | 3137077248 | 0.171137 | 0.125
The output above was from modifying some very helpful code in the answer I linked to.
Upvotes: 1
Reputation: 151879
It would be useful to see a complete, compilable reproducer code. However you're probably running into memory fragmentation.
Even though a large amount of memory may be reported as being free, it's possible that it can't be allocated in a single large contiguous chunk. This fragmentation will then limit the maximum size of a single allocation that you can request.
It's probably not really a question of how you are freeing memory, but more a function of what overhead allocations remain after you free the memory. The fact that you are checking the mem info and getting a large number back says to me that you are freeing your allocations correctly.
To try to work around this, one approach would be to manage and re-use your allocations carefully. For instance, if you need a large 2.4G working device vector of float
on the device, then allocate that once, and re-use it for successive operations. Also, if you have any remaining allocations on the device immediately before you are trying to re-allocate the 2.4G vector, then try freeing those (i.e. free all allocations you have made on the device) before trying to re-allocate the 2.4G vector.
Upvotes: 2