Reputation: 95
I have a cuda kernel written in numba-cuda that processes large arrays that do not fit in GPU memory at once. So, I have to call the kernel multiple times to process the entire arrays. The kernel is called in a loop and, inside the loop, after GPU is done the computation, I copy and aggregate the results back to a host array.
My questions:
Thanks.
Upvotes: 1
Views: 561
Reputation: 72344
- What is the lifetime of a device array and an array that is copied to GPU memory? Are their value preserved from one kernel call to another?
In Numba, global memory allocations are preserved until they are freed.
- Do I need to put the device arrays definitions inside the loop (before I call the kernel) or do I just do it once before I enter the loop?
The latter.
- Do I need to free/delete the device arrays manually in the code or the CUDA memory manager will do it at the end of the program?
The first thing to realize is that there is no CUDA memory manager in the way you imagine. Memory allocations are automatically freed when a context is destroyed, otherwise they are not under any circumstances. The only exception to this is if you use a Numba device_array
, which may be garbage collected by Python if it falls out of scope. But you should, in general, assume that anything you allocate remains in memory until you explicitly free it, and always include explicit memory deallocation in your code.
Upvotes: 1