권재범
권재범

Reputation: 11

How can I accelerate simple matlab code by GPU device

I have encountered some problems while I was simulating simple code on Matlab GPU computing.

first case, I compared fft2 computation time of CPU with GPU

By CPU:

A=rand(2000);
tic
for K=1:200
   yy=fft2(A);
end
toc

By GPU:

A=gpuArray(A);
tic
for K=1:200
   yy=fft2(A);
end
toc

It took 5.984209 sec by CPU and 0.036392 sec by GPU. It seem to be a reasonable result.

At second case, I tried simple calculation.

By CPU:

D=rand(1,2000);
E=rand(1,2000);
tic
for K=1:2000
  pp=sqrt(D(K)^2+E(K)^2)/E(K);
end
toc

By GPU:

F=gpuArray(D);
G=gpuArray(E);
tic
for K=1:2000
  qq=sqrt(F(K)^2+G(K)^2)/G(K);
end
toc

It took 0.002940 sec by CPU and 2.699595 sec by GPU. It's a very strange result!

Why is this happening? I know that it can be faster by using 'arrayfun' for gpuArray inputs.

Is there no way to calculate 'for loop' by using GPU except for arrayfun?

I know that one GPU has a few thousands cores. In that case, is it possible to use 'parfor loop' by using one GPU?

For non built in function, it is difficult to fit all conditions suitable for 'arrayfun'.

So, I think it is natural to using 'for loop' in a self-function. However, from the results above, 'for loop' makes the total process slow for gpuArray inputs.(It is slower than a CPU result)

So, I think all of general codes cannot be converted into 'GPU form' by using full advantage of GPU computing. Is that right?

Upvotes: 1

Views: 238

Answers (1)

Edric
Edric

Reputation: 25140

As you have discovered, a FOR loop over scalar elements of gpuArray data performs terribly badly, and is almost always a bad idea. In general, you need to use either vectorised operations or arrayfun to get good performance. If that doesn't give you enough flexibility or performance, you can always use the CUDAKernel interface or the GPU MEX interface.

Upvotes: 1

Related Questions