assma
assma

Reputation: 1

Initialize big images using cuda

I am working on HD image processing using CUDA 7.5 with NVIDIA GEFORCE 840M on Ubuntu 14.04. I have a 3750*3750 image, and I have troubles to initialize an array of this dimension. The following code runs until k is about 4000.

__device__ int sImg;

__device__ int *B;

/* ############################### INITILIAZE ############################## */

__global__ void initialize(int *tab, int v, int s)
{    
  int k = blockDim.x*blockIdx.x + threadIdx.x ;
  if ( k < s )
    tab[k] = v;
}

/* ########################### The parent kernel ########################### */

__global__ void EDGE(int *A, int *C ,int h, int w, int dim, int nbScales)
{
  sImg = dim*dim;
  cudaMalloc((void**)&B,sImg*sizeof(int));

  int threadsPerBlock = 256;
  int blocksPerGrid = (sImg + threadsPerBlock -1) / threadsPerBlock;

  /// I have troubles here, it does not complete the process
  initialize<<<blocksPerGrid,threadsPerBlock>>>(B,0,sImg);
  cudaDeviceSynchronize();
  initialize<<<blocksPerGrid,threadsPerBlock>>>(C,0,sImg);
  cudaDeviceSynchronize();  

  /// A transormation into frequency domain
  FSDWT <<< 1 , nbScales >>> (A,B, h, w,dim,nbScales);
  cudaDeviceSynchronize();

  /// Tresholding the transform                                     
  Treshold<<<1,1>>>(B,C,dim*dim);
  cudaDeviceSynchronize();

  cudaFree(B);
}

/* ############################  call from host ############################ */

extern "C" void EDGE_host(int *A,int *B,int h,int w,int dim, int nbScales)
{
  EDGE <<< 1 , 1 >>> (A,B, h, w,dim,nbScales);
}

Thank you very much

Upvotes: 0

Views: 220

Answers (3)

assma
assma

Reputation: 1

Thank you very much dear all My problem is solved with the answer of Mr tera. Effectively, the problem was with memory allocation. I added the following line in the main function before calling any kernel.

cudaDeviceSetLimit(cudaLimitMallocHeapSize, 128*1024*1024);

Upvotes: 0

tera
tera

Reputation: 7255

Memory allocations made from device code are satisfied from a pool of limited size. Either set aside more memory for device side allocations before any is made with a call to cudaDeviceSetLimit(cudaLimitMallocHeapSize, ...), or allocate the memory from the host side using cudaMalloc().

When choosing how much memory to set aside, be aware that a call to malloc() on the device makes a separate allocation for each thread, so requirements increase quickly with the number of threads running in parallel. If, as in your case, the kernel doesn't free the memory, the memory needed grows with the total number of threads run, rather than with the number of threads running in parallel.

Upvotes: 1

pSoLT
pSoLT

Reputation: 1052

Ok, couple of things:

1) use cudaMalloc instead of malloc

2) in cudaMalloc use sizeImage instead of dim*dim ( i assume they are the same)

Upvotes: 1

Related Questions