Reputation: 303
I started CUDA last week as I have to convert an existing c++ programme to cuda for my research.
This is a basic example from the CUDA by Example book, which I reccommend to anyone who wants to learn CUDA!
Can someone explain how you can assign GPU memory with 'dev_c' which is an empty pointer?
HANDLE_ERROR( cudaMalloc( (void**)&dev_c, N * sizeof(int) ) );
Then, not pass any 'dev_c' values when calling the function 'add' but treat *c as an array in the global function and write to it from within the function? Why is this possible when its not defined as an array anywhere?
add<<<N,1>>>( dev_a, dev_b, dev_c );
Finally, where exactly do the terms c[0], c[1] etc. get saved when performing the following addition?
c[tid] = a[tid] + b[tid];
I hope I am explaining myself well but feel free to ask any follow-up questions. New to C as well as CUDA so be nice :D
Entire code below:
#include "book.h"
#define N 1000
__global__ void add( int *a, int *b, int *c ) {
int tid = blockIdx.x; // this thread handles the data at its thread id
if (tid < N)
c[tid] = a[tid] + b[tid];
}
int main( void ) {
int a[N], b[N], c[N];
int *dev_a, *dev_b, *dev_c;
// allocate the memory on the GPU
HANDLE_ERROR( cudaMalloc( (void**)&dev_a, N * sizeof(int) ) );
HANDLE_ERROR( cudaMalloc( (void**)&dev_b, N * sizeof(int) ) );
HANDLE_ERROR( cudaMalloc( (void**)&dev_c, N * sizeof(int) ) );
// fill the arrays 'a' and 'b' on the CPU
for (int i=0; i<N; i++) {
a[i] = -i;
b[i] = i * i;
}
// copy the arrays 'a' and 'b' to the GPU
HANDLE_ERROR( cudaMemcpy( dev_a, a, N * sizeof(int),
cudaMemcpyHostToDevice ) );
HANDLE_ERROR( cudaMemcpy( dev_b, b, N * sizeof(int),
cudaMemcpyHostToDevice ) );
add<<<N,1>>>( dev_a, dev_b, dev_c );
// copy the array 'c' back from the GPU to the CPU
HANDLE_ERROR( cudaMemcpy( c, dev_c, N * sizeof(int),
cudaMemcpyDeviceToHost ) );
// display the results
for (int i=0; i<N; i++) {
printf( "%d + %d = %d\n", a[i], b[i], c[i] );
}
// free the memory allocated on the GPU
HANDLE_ERROR( cudaFree( dev_a ) );
HANDLE_ERROR( cudaFree( dev_b ) );
HANDLE_ERROR( cudaFree( dev_c ) );
return 0;
}
Thank you!
Upvotes: 0
Views: 3311
Reputation: 152184
It's not going to be possible to teach CUDA in the space of an SO question. I will try to answer your questions, but you should probably avail yourself of some resources. It will be especially difficult if you don't know C or C++, because typical CUDA programming depends on those.
You might want to take some introductory webinars here such as:
GPU Computing using CUDA C – An Introduction (2010) An introduction to the basics of GPU computing using CUDA C. Concepts will be illustrated with walkthroughs of code samples. No prior GPU Computing experience required
GPU Computing using CUDA C – Advanced 1 (2010) First level optimization techniques such as global memory optimization, and processor utilization. Concepts will be illustrated using real code examples
Now to your questions:
Can someone explain how you can assign GPU memory with 'dev_c' which is an empty pointer?
dev_c
starts out as an empty pointer. But the cudaMalloc
function allocates GPU memory according to the size passed to it, establishes a pointer to that allocation, and stores that pointer into the dev_c
pointer. It can do this because we are passing the address of dev_c
, not the actual pointer itself.
Then, not pass any 'dev_c' values when calling the function 'add' but treat *c as an array in the global function and write to it from within the function? Why is this possible when its not defined as an array anywhere?
In C, a pointer (which is what dev_c
is) can point to a single value or an array of values. The pointer itself does not contain information about how much data it is pointing to. Since dev_c
is storing the result, and it has already been properly initialized by the preceding cudaMalloc
function, we can use it to store the result of the operations in the kernel. dev_c
actually points to an area of storage of (an array of) int
, the size of which is given by N * sizeof(int)
, as passed to the preceding cudaMalloc
function.
Finally, where exactly do the terms c[0], c[1] etc. get saved when performing the following addition?
In c, when we have a function definition like so:
void my_function(int *c){...}
This says that statements within the function can reference a variable named c
as if it were a pointer to one or more int
values (either a single value or an array of values, stored beginning at the location pointed to by c
).
When we call that function, we can use some other variable named as an argument, for the function parameter called c
, like so:
int my_ints[32];
my_function(my_ints);
Now, inside my_function
, wherever the parameter c
is referenced, it will use the argument value given by the (pointer) my_ints
.
The same concepts hold for cuda functions (kernels) and their arguments and parameters.
Upvotes: 2