username_4567
username_4567

Reputation: 4923

Pinned memory in Nvidia CUDA

I'm writing matrix addition program for GPUs using Streams and obviously pinned memory.So I allocated 3 matrices in pinned memory but after particular dimensions it shows API error 2:out of memory.My RAM is 4GB but i'm not able to use beyond 800MB.Is there any way by which we can control this upper limit? My sys config: nVidia GEForce 9800GTX Intel core 2 Quad For streamed execution code looks as follows

(int i=0;i<no_of_streams;i++)
    {
       cudaMemcpyAsync(device_a+i*(n/no_of_streams),hAligned_on_host_a+i*(n/no_of_streams),nbytes/no_of_streams,cudaMemcpyHostToDevice,streams[i]);
       cudaMemcpyAsync(device_b+i*(n/no_of_streams),hAligned_on_host_b+i*(n/no_of_streams),nbytes/no_of_streams,cudaMemcpyHostToDevice,streams[i]);
       cudaMemcpyAsync(device_c+i*(n/no_of_streams),hAligned_on_host_c+i*(n/no_of_streams),nbytes/no_of_streams,cudaMemcpyHostToDevice,streams[i]);
       matrixAddition<<<blocks,threads,0,streams[i]>>>(device_a+i*(n/no_of_streams),device_b+i*(n/no_of_streams),device_c+i*(n/no_of_streams));
       cudaMemcpyAsync(hAligned_on_host_a+i*(n/no_of_streams),device_a+i*(n/no_of_streams),nbytes/no_of_streams,cudaMemcpyDeviceToHost,streams[i]);
       cudaMemcpyAsync(hAligned_on_host_b+i*(n/no_of_streamss),device_b+i*(n/no_of_streams),nbytes/no_of_streams,cudaMemcpyDeviceToHost,streams[i]);
       cudaMemcpyAsync(hAligned_on_host_c+i*(n/no_of_streams),device_c+i*(n/no_of_streams),nbytes/no_of_streams,cudaMemcpyDeviceToHost,streams[i]));

    }

Upvotes: 2

Views: 1559

Answers (1)

P O&#39;Conbhui
P O&#39;Conbhui

Reputation: 1223

So, you haven't specified if this happens after the cudaMalloc or the cudaHostAlloc function calls.

Pinned memory is a limited resource. Any memory defined as being in pinned memory must always be in RAM. As such, that leaves less room in RAM for other system applications. This means, you can't have 4GB of pinned memory if you have 4GB of RAM, or else nothing else could run.

800MB might be a system imposed limit. Considering it's a quarter of your RAM, it might be a reasonable limit. It is also quite close to the size of your global memory. A failure on the card wouldn't translate to a failure on the host, so if it's complaining without having to run something like cudaGetLastError, it's probably a problem on the host.

Sorry I don't know specifics of increasing your pinned memory limit.

Upvotes: 1

Related Questions