Andrea Sylar Solla
Andrea Sylar Solla

Reputation: 157

Kernel variable locations

I'm creating a kernel to manage a very big quantity of variables located on structs and arrays. I know that if I'm using variables allocated through cudaMalloc (global memory...right?) the computation will be very slow (I've tried it, and the result is slower than the sequentially version of my algorithm).

If I copy the data's arrays on kernel's variables will I increase my performance?

The kernel memory (it should be called "local memory", right?) should be faster than the global memory?

Upvotes: 0

Views: 115

Answers (2)

chaohuang
chaohuang

Reputation: 4115

Local memory is as slow as global memory. If your data is too big to store in register or shared memory and you don't need write operations, you can try to use texture memory or constant memory, which are cached hence are faster than global memory.

Upvotes: 1

user_48349383
user_48349383

Reputation: 517

I think that you are bit confused about the way CUDA works. I will try to help as best as I can but I strongly recommend you take a look at the CUDA Programming Guide as well as the examples included with CUDA. For your work on structs I would recommend the Black Scholes example.

I know that if I'm using variables allocated through cudaMalloc (global memory...right?) the computation will be very slow (I've tried it, and the result is slower than the sequentially version of my alghoritm)

Yes, you do allocate to the GPU device with cudaMalloc to global memory, correct. The computation should not necessarily be slow, however the process of copying a large amount of data to Device (GPU) memory might be slow depending on your definition of slow. It is always good to limit copying memory to the device in CUDA.

If I copy the data's arrays on kernel's variables will I increase my performance? The kernel memory (it's should be called "local memory", right?) should be faster than the global memory?

This statement doesn't make sense, I do not think you understand how device memory works.

Do not worry about memory optimization until you get further. In particular you should be checking Every single CUDA call for an error especially CudaMalloc and CudaMemcpy otherwise you will have some serious problems.

If you are planning on really learning GPU programming I recommend reading a lot about it and checking out sample programs. If not, you should definitely check out some of the existing software for using GPUs without being a programmer. In particular Thrust is really excellent for this purpose, especially for Map/Reduce style tasks.

Upvotes: 0

Related Questions