Reputation: 11
I am attempting to perform a cumulative sum using MATLAB's cumsum function on 22000x22000 gpuArray filled with -1s,0s and 1s. I am using an NVIDIA GeForce GTX 780 Ti which has 3GB of memory. A double-precision gpuArray is too large (3.9GB) but naturally the single precision fits.
Attempting to do a cumsum on my single-precision gpuArray again resulted in reaching my memory limit, but I am not sure if this is due to memory types or the approach to calculating cumsum as it is a MATLAB p-file. This also means regardless I have little control over the datatypes used for calculation. Edit: Also cumsum does not support integer datatypes. Edit: On further inspection the result of performing this on a reduced array is a single, so highly likely function operates on array as input type.
So my question is: is there an alternative to cumsum? (not loops - see note) Whether through MATLAB or CUDA that allows specification of datatype. Or can someone outline how to vectorise (matricise?) the cumsum operation so I can write it myself?
EDIT: Alternative must be able to operate on integer types as just producing the CUM matrix will exceed memory limit. 2xsingle = double...
NOTE: Given I will be performing this computation a significant number of times (no bound presented in paper) I would ideally have the >200x speed increase of cumsum(gpuArray) vs cumsum(double). And don't even mention loops, ridiculously slow.
tic;CUM = cumsum(W,2);toc
Elapsed time is 0.002180 seconds.
K = gather(W);
tic;CUM = cumsum(K,2);toc
Elapsed time is 0.125203 seconds.
Upvotes: 1
Views: 1492
Reputation: 528
just a remark, timing your code like this is not correct, when dealing with gpuArray
. You should time it like this
tic;CUM = cumsum(W,2);wait(gpuDevice());toc
Otherwise, MATLAB
doesn't count the actually computation, because it isn't needed for the CPU timing function toc
to execute.
Just a remark for your timing.
For less memory consumption, you can use
W = gpuArray(rand(22000,22000,'uint8'));
This works for the cumsum
on the GPU device.
Upvotes: 1
Reputation: 1566
If you're only going to be using the values -1, 0, and 1 in your matrix, you can get away using a byte of memory using the signed 8-bit integer type int8 which allows numbers -128 to 127. (With only the options of -1,0 and 1, you could technically fit four values into one byte with 2-bits each, but you may only need to do that if you're still running out of memory).
So if you want to initialise your array with int8's, you can do this:
gpuArray = zeros(22000,22000,'int8');
Which should be <500MB in size.
Upvotes: 1