Reputation: 903
After fixing the code I posted here (adding *sizeof(float) to shared memory allocation - but It doesn't matter since here I allocate shared memory through MATLAB), I ran the code, which successfully returned results of size up to sizeof(float)*18*18*5000*100 bytes.
I took the PTX, and used it to run the code though MATLAB (It found the right entry point - the function I wanted to run)
kernel=parallel.gpu.CUDAKernel('Tst.ptx','float *,const float *,int');
mask=gpuArray.randn([7,7,1],'single');
toConv=gpuArray.randn([12,12,5],'single'); %%generate random data for testing
setConstantMemory(kernel,'masks',mask); %%transfer data to constant memory.
kernel.ThreadBlockSize=[(12+2*7)-2 (12+2*7)-2 1];
kernel.GridSize=[1 5 1]; %%first element is how many convolution masks
%%second one is how many matrices we want to convolve
kernel.SharedMemorySize=(24*24*4);
foo=gpuArray.zeros([18 18 5 1],'single'); %%result size
foo=reshape(foo,[numel(foo) 1]);
toConv=reshape(toConv,[numel(toConv) 1]);
foo=feval(kernel,foo,toConv,12);
I get:
Error using parallel.gpu.CUDAKernel/feval An unexpected error occurred trying to launch a kernel. The CUDA error was: CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
Error in tst (line 12) foo=feval(kernel,foo,toConv,12);
out of resources for such a small example? It worked for a problem a hundred thousand times larger in Visual Studio...
I have GTX 480 (compute 2.0, about 1.5 GB memory, 1024 max threads per block, 48K shared memory)
1> ptxas : info : 0 bytes gmem, 25088 bytes cmem[2]
1> ptxas : info : Compiling entry function '_Z6myConvPfPKfi' for 'sm_21'
1> ptxas : info : Function properties for _Z6myConvPfPKfi
1> 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
1> ptxas : info : Used 10 registers, 44 bytes cmem[0]
EDIT: problem resolved by compiling with Configuration Active(Release)
and Platform Active(x64)
Upvotes: 1
Views: 911
Reputation: 903
problem resolved by compiling with Configuration Active(Release)
and Platform Active(x64)
instead of default (Due to backwards compatibility, I'm guessing it's not about the x64 as much as about compiling for release and not for debug)
Upvotes: 1