Reputation: 18860
The following codes are widely used for GPU global memory allocation:
float *M;
cudaMalloc((void**)&M,size);
I wonder why do we have to pass a pointer to a pointer to cudaMalloc, and why it was not designed like:
float *M;
cudaMalloc((void*)M,size);
Thanks for any plain descriptions!
Upvotes: 3
Views: 568
Reputation: 120711
To explain the need in a little more detail:
Before the call to cudaMalloc
, M
points... anywhere, undefined. After the call to cudaMalloc
you want a valid array to be present at the memory location where it points at. One could naïvely say "then just allocate the memory at this location", but that's of course not possible in general: the undefined address will normally not even be inside valid memory. cudaMalloc
need to be able to choose the location. But if the pointer is called by value, there's no way to tell the caller where.
In C++, one could make the signature
template<typename PointerType>
cudaStatus_t cudaMalloc(PointerType& ptr, size_t);
where passing ptr
by reference allows the function to change the location, but since cudaMalloc
is part of the CUDA C API this is not an option. The only way to pass something as modifiable in C is to pass a pointer to it. And the object is itself a pointer what you need to pass is a pointer to a pointer.
Upvotes: 0
Reputation: 24847
cudaMalloc
needs to write the value of the pointer to M
(not *M
), so M
must be passed by reference.
Another way would be to return the pointer in the classic malloc
fashion. Unlike malloc
, however, cudaMalloc
returns an error status, like all CUDA runtime functions.
Upvotes: 6