Reputation: 4082
I am currently working on a CUDA application that will use as much global device memory (VRAM) as is available if the processed data is sufficiently large. What I am allocating is a 3D volume using cudaMalloc3d
, so the memory I use must be contiguous. For this purpose I tried retrieving the amount of free device memory by using the function cudaMemGetInfo
and then allocating as much as is free. However, this does not seem to work. I still get errors when trying to allocate that amount of memory.
Now, my question is whether there is a way to retrieve the maximum amount of device memory that I can allocate contiguously.
One option would be a trial-and-error approach where I iteratively decrease the amount I try to allocate until allocation succeeds. However, I don't like this idea very much.
Background: I have a program that does cone-beam CT reconstruction on the GPU. Those volumes can become quite large so I split them into chunks when necessary. Therefore I have to know how large a chunk can at most be to still fit into global device memory.
Upvotes: 5
Views: 1920
Reputation: 151799
Now, my question is if there is a way to retrieve the maximum amount of device memory that I can allocate contiguously.
There is not.
With a bit of trial and error, you can come up with an estimated maximum, say 80% of the available memory reported by cudaMemGetInfo()
, and use that.
The situation with cudaMalloc
is generally similar to a host-side allocator, e.g. malloc
. If you queried the host operating system for the available memory, then tried to allocate all of it in a single malloc
call, it would likely fail.
Upvotes: 7