Reputation: 1131
I'm trying to speed up my computing by using gpuArray
. However, that's not the case for my code below.
for i=1:10
calltest;
end
function [t1,t2]=calltest
N=10;
tic
u=gpuArray(rand(1,N).^(1./[N:-1:1]));
t1=toc
tic
u2=rand(1,N).^(1./[N:-1:1]);
t2=toc
end
where I get
t1 =
4.8445e-05
t2 =
1.4369e-05
I have an Nvidia GTX850M graphic card. Am I using gpuArray
incorrectly? This code is wrapped inside a function, and the function is called by a loop thousands of times.
Upvotes: 1
Views: 567
Reputation: 1
The method of comparison is blurring the root-cause of the problem
N = 10;
R = rand( 1, N );
tic; < a-gpu-based-computing-section>; GPU_t = toc
tic; c = R.^( 1. / [N:-1:1] ); CPU_t = toc
trying just 10 elements, will not make the observation clear, as an overhead-naive formulation of Amdahl Law does not explicitly emphasise the added time, spent on CPU-based GPU-kernel assembly & transport + ( CPU-to-GPU + GPU-to-CPU ) data-handling phases. These add-on phases may get negligibly small, if compared to
a) an indeed large-scale vector / matrix GPU-kernel processing, which N ~10 obviously is not
or
b) an indeed "mathematically-dense" GPU-kernel processing, which R.^()
obviously is not
so,
do not blame the GPU-computing for having acquired a must-do part ( the overheads ) as it cannot get working without this prior add-ons in time ( and CPU may, during the same amount of time, produce the final result - Q.E.D. )
N = 10; %% 100, 1000, 10000, 100000, ..
tic; CPU_hosted = rand( N, 'single' ); %% 'double'
CPU_gen_RAND = toc
tic; GPU_hosted_IN1 = gpuArray( CPU_hosted );
GPU_xfer_h2d = toc
tic; GPU_hosted_IN2 = rand( N, 'gpuArray' );
GPU_gen__h2d = toc
tic; <kernel-generation-with-might-be-lazy-eval-deferred-xfer-setup>;
GPU_kernel_AssyExec = toc
tic; CPU_hosted_RES = gather( GPU_hosted_RES );
GPU_xfer_d2h = toc
Upvotes: 2