Reputation: 31
This is one of the standard example code we find every where...
import time
import numpy
import pycuda.gpuarray as gpuarray
import pycuda.cumath as cumath
import pycuda.autoinit
size = 1e7
t0 = time.time()
x = numpy.linspace(1, size, size).astype(numpy.float32)
y = numpy.sin(x)
t1 = time.time()
cpuTime = t1-t0
print(cpuTime)
t0 = time.time()
x_gpu = gpuarray.to_gpu(x)
y_gpu = cumath.sin(x_gpu)
y = y_gpu.get()
t1 = time.time()
gpuTime = t1-t0
print(gpuTime)
the results are: 200 msec for cpu and 2.45 sec for GPU... more then 10X
I'm running on win 10... vs 2015 with PTVS...
Best regards...
Steph
Upvotes: 1
Views: 925
Reputation: 9968
It looks like pycuda
introduces some additional overhead the first time you call the cumath.sin()
function (~400ms on my system). I suspect this is due to the need to compile CUDA code for the function being called. More importantly, this overhead is independent of the size of the array being passed to the function. Additional calls to cumath.sin()
are much faster, with CUDA code already compiled for use. On my system, the gpu code given in the question runs in about 20ms (for repeated runs), compared to roughly 130ms for the numpy code.
I don't profess to know much at all about the inner workings of pycuda
, so would be interested to hear other people's opinions on this.
Upvotes: 2