Reputation: 1197
I am trying to sum all the pixels in an image, and get the average of all pixels using the CUDA NPP library. My image is an 8-bit unsigned char grayscale
image of dimension w256 x h1024
. I have tried to follow all the required rules of declaring pointers and passing the corresponding NPP-type pointers to the NPP functions.
However, I am getting an unknown error
when I perform GPU error checking on my code. I tried to debug it but, I can't seem to figure out as to where I am going wrong, and I would like some help please?
I am using OpenCV in addition to this to do my processing, and hence some OpenCV code will be present.
EDIT: Code has been updated
#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
{
if (code != cudaSuccess)
{
fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
if (abort) getchar();
}
}
// process image here
// device_pointer initializations
unsigned char *device_input;
unsigned char *device_output;
size_t d_ipimgSize = input.step * input.rows;
size_t d_opimgSize = output.step * output.rows;
gpuErrchk( cudaMalloc( (void**) &device_input, d_ipimgSize) );
gpuErrchk( cudaMalloc( (void**) &device_output, d_opimgSize) );
gpuErrchk( cudaMemcpy(device_input, input.data, d_ipimgSize, cudaMemcpyHostToDevice) );
// Median filter the input image here
// .......
// start summing all pixels
Npp64s *partialSum = 0;
partialSum = (Npp64s *) malloc(sizeof(Npp64s));
int bytes = input.cols*input.rows;
Npp8u *scratch = nppsMalloc_8u(bytes);
int ostep = input.step;
NppiSize imSize;
imSize.width = input.cols;
imSize.height = input.rows;
// copy processed image data into a source_pointer
unsigned char *odata;
odata = (unsigned char*) malloc( sizeof(unsigned char) * input.rows * input.cols);
memcpy(odata, output.data, sizeof(unsigned char) * input.rows * input.cols);
// compute the sum over all the pixels
nppiSum_8u64s_C1R( odata, ostep, imSize, scratch, partialSum );
// print sum
printf( "\n Total Sum cuda %d \n", *partialSum) ;
gpuErrchk(cudaFree(device_input)); // <--- Unknown error here
gpuErrchk(cudaFree(device_output));
Upvotes: 1
Views: 1303
Reputation: 1024
The partialSum
argument in nppiSum_8u64s_C1R
should be device allocated memory.
Further you allocate scratch buffer of the size of your image. There's a function called nppiSumGetBufferHostSize_8u64s_C1R
that gives you the exact size for the scratch buffer, which might be larger than the image itself (not very likely for a simple summation, but possible).
And always check return values in NPP as for Cuda, too. nppiSum_8u64s_C1R
probably won't return NPP_NO_ERROR
in your case.
Upvotes: 1