Understanding CUDA indexing

Question

I inherited some CUDA code that I need to work on but some of the indexing done in it is confusing me.

A simple example would be the normalisation of data. Say we have a shared array A[2*N] which is a matrix of shape 2xN which has been unrolled to an array. Then we have the normalisation means and standard deviation: norm_means[2] and norm_stds[2]. The goal is to normalise the data in A in parallel. A minimal example would be:

__global__ void normalise(float *data, float *norm, float *std) {
int tdy = threadIdx.y;

for (int i=tdy; i>>(A_d, norm_means_d, norm_stds_d);

}

Note that I am using Eigen for the matrix generation. I have omitted the includes for brevity.

This code above through some magic works and achieves the desired results. However, the CUDA kernel function does not make any sense to me because the for loop should stop after one execution as i>D after the first iteration .. but it doesn't?

If I change the kernel that makes more sense to me eg.

__global__ void normalise(float *data, float *norm, float *std) {
int tdy = threadIdx.y;
for (int i=0; i



the program stops working and just outputs gibberish data.

Can somebody explain why I get this behaviour?

PS. I am very new to CUDA

Understanding CUDA indexing

Answers (1)

Related Questions