Reputation: 21723
I am allocating some float arrays (pretty large, ie 9,000,000 elements) on the GPU using cudaMalloc((void**)&(storage->data), size * sizeof(float))
. In the end of my program, I free this memory using cudaFree(storage->data);
.
The problem is that the first deallocation is really slow, around 10 seconds, whereas the others are nearly instantaneous.
My question is the following : what could cause this difference ? Is deallocation memory on a GPU usually that slow ?
Upvotes: 4
Views: 1905
Reputation: 6425
As pointed out on the NVIDIA forums, it's almost certainly a problem with the way you are timing things rather than with cudaFree.
Upvotes: 3
Reputation: 51435
should not be that slow, on Linux with cuda 2.2 it takes fraction of a second. Have you tried to run host and device profilers to see exactly why a slow? how many separate allocation do you perfor?, that does have some penalty but not so large.
Upvotes: 1