Fjolfrin
Fjolfrin

Reputation: 33

How to pass a vector's data to a CUDA kernel?

So I have this assignment about convolution, where I have to apply a .wav filter to another .wav file. I have to do this using CUDA. This is my CUDA kernel:


__global__ void MyConvolveCUDA(const double* A, const double* B, double* C, int n, int m) {

    int i = threadIdx.x + blockIdx.x * blockDim.x;
    int j = threadIdx.y + blockIdx.y * blockDim.y;

    int min, max;
    if (i >= m - 1) min = i - m + 1; else min = 0;
    if (i < n - 1) max = i; else max = n - 1;

    if (j <= min) j = min;
    else if (j >= max) j = max;

    C[i] = A[i] * B[j - i];
}

and this is the function where I try it. I have used a custom lib for reading the audio files (they are read correctly and everything), so I'm gonna simplify the audio files' part of the code:


void MyConvolveCUDA_Run() {

    //Let's say that  'filter' is the filter i want to apply to the 'audio' file. 'output' is the file I 
    //want to export in the end. The '.samples' function accesses the samples' part of the audio file, 
    //and the 'save' function saves the file using the given name.

    int n = audio.samples.size(),
        m = filter.samples.size();

    //These are the device copies of the data I want to proccess.
    double* audioCUDA = nullptr;
    double* filterCUDA = nullptr;
    double* outputCUDA = nullptr;

    cudaMalloc((void **)&audioCUDA, n * sizeof(double));
    cudaMalloc((void **)&filterCUDA, n * sizeof(double));
    cudaMalloc((void **)&outputCUDA, (n + m - 1) * sizeof(double));

    cudaMemcpy(audioCUDA, audio.samples[0].data(), n * sizeof(double), cudaMemcpyHostToDevice);
    cudaMemcpy(filterCUDA, filter.samples[0].data(), m * sizeof(double), cudaMemcpyHostToDevice);

    MyConvolveCUDA << < 32, 32 >> > (audioCUDA, filterCUDA, outputCUDA, n, m);
    cudaDeviceSynchronize();

    cudaMemcpy(output.samples[0].data(), outputCUDA, (n + m - 1) * sizeof(double), cudaMemcpyDeviceToHost);

    cudaFree(audioCUDA); cudaFree(filterCUDA); cudaFree(outputCUDA);

    output.save("CUDA_output.wav");
}

Can you understand what's going wrong?? I want to check on the arrays I pass in to MyConvolveCUDA, but every time I try I get an access violation error.

Thanks in advance!

Upvotes: 1

Views: 279

Answers (1)

You are launching the CUDA kernel MyConvolveCUDA as MyConvolveCUDA<<<32,32>>> which means you are launching 32 blocks each having 32 threads(1024 threads). In the kernel you are using 2D thread indexing but you launched only 1D threads.

MyConvolveCUDA<<<M,N>>>is interpreted as

MyConvolveCUDA<<<dim3(M,1,1),dim3(M,1,1)>>>

where M is the number of blocks and N is the number of threads per kernel i.e; we are launching threads in only x direction. For this, threadIdx.y and blockIdx.y will always be 0.

If you want to launch it in 2 dimensions then you should call the kernel as MyConvolveCUDA<<<dim3(M,N),dim3(M,N)>>>.

To check the arrays inside the kernel you can print them like

int i = threadIdx.x + blockIdx.x * blockDim.x;
# if __CUDA_ARCH__>=200 
    if(i==0){
        for(int iter=0;iter<n;iter++)
            printf("%ld ", A[iter]); 
        for(int iter=0;iter<m;iter++)
            printf("%ld ", B[iter]); 
    }
#endif 

Upvotes: 1

Related Questions