Reputation: 33
So I have this assignment about convolution, where I have to apply a .wav filter to another .wav file. I have to do this using CUDA. This is my CUDA kernel:
__global__ void MyConvolveCUDA(const double* A, const double* B, double* C, int n, int m) {
int i = threadIdx.x + blockIdx.x * blockDim.x;
int j = threadIdx.y + blockIdx.y * blockDim.y;
int min, max;
if (i >= m - 1) min = i - m + 1; else min = 0;
if (i < n - 1) max = i; else max = n - 1;
if (j <= min) j = min;
else if (j >= max) j = max;
C[i] = A[i] * B[j - i];
}
and this is the function where I try it. I have used a custom lib for reading the audio files (they are read correctly and everything), so I'm gonna simplify the audio files' part of the code:
void MyConvolveCUDA_Run() {
//Let's say that 'filter' is the filter i want to apply to the 'audio' file. 'output' is the file I
//want to export in the end. The '.samples' function accesses the samples' part of the audio file,
//and the 'save' function saves the file using the given name.
int n = audio.samples.size(),
m = filter.samples.size();
//These are the device copies of the data I want to proccess.
double* audioCUDA = nullptr;
double* filterCUDA = nullptr;
double* outputCUDA = nullptr;
cudaMalloc((void **)&audioCUDA, n * sizeof(double));
cudaMalloc((void **)&filterCUDA, n * sizeof(double));
cudaMalloc((void **)&outputCUDA, (n + m - 1) * sizeof(double));
cudaMemcpy(audioCUDA, audio.samples[0].data(), n * sizeof(double), cudaMemcpyHostToDevice);
cudaMemcpy(filterCUDA, filter.samples[0].data(), m * sizeof(double), cudaMemcpyHostToDevice);
MyConvolveCUDA << < 32, 32 >> > (audioCUDA, filterCUDA, outputCUDA, n, m);
cudaDeviceSynchronize();
cudaMemcpy(output.samples[0].data(), outputCUDA, (n + m - 1) * sizeof(double), cudaMemcpyDeviceToHost);
cudaFree(audioCUDA); cudaFree(filterCUDA); cudaFree(outputCUDA);
output.save("CUDA_output.wav");
}
Can you understand what's going wrong?? I want to check on the arrays I pass in to MyConvolveCUDA, but every time I try I get an access violation error.
Thanks in advance!
Upvotes: 1
Views: 279
Reputation: 112
You are launching the CUDA kernel MyConvolveCUDA
as MyConvolveCUDA<<<32,32>>>
which means you are launching 32 blocks each having 32 threads(1024 threads). In the kernel you are using 2D thread indexing but you launched only 1D threads.
MyConvolveCUDA<<<M,N>>>
is interpreted as
MyConvolveCUDA<<<dim3(M,1,1),dim3(M,1,1)>>>
where M is the number of blocks and N is the number of threads per kernel i.e; we are launching threads in only x direction. For this, threadIdx.y
and blockIdx.y
will always be 0.
If you want to launch it in 2 dimensions then you should call the kernel as MyConvolveCUDA<<<dim3(M,N),dim3(M,N)>>>
.
To check the arrays inside the kernel you can print them like
int i = threadIdx.x + blockIdx.x * blockDim.x;
# if __CUDA_ARCH__>=200
if(i==0){
for(int iter=0;iter<n;iter++)
printf("%ld ", A[iter]);
for(int iter=0;iter<m;iter++)
printf("%ld ", B[iter]);
}
#endif
Upvotes: 1