user2267896
user2267896

Reputation: 193

pycuda.gpuarray.dot() very slow at first call

I have a working conjungate gradient method implementation in pycuda, that I want to optimize. It uses a self written matrix-vector-multiplication and the pycuda-native gpuarray.dot and gpuarray.mul_add functions

Profiling the program with kernprof.py/line_profiler returned most time (>60%) till convergence spend in one gpuarray.dot() call. (About .2 seconds) All following calls of gpuarray.dot() take about 7 microseconds. All calls have the same type of input vectors (size: 400 doubles)

Is there any reason why? I mean in the end it's just a constant, but it is making the profiling difficult. I wanted to ask the question at the pycuda mailing list. However I wasn't able to subscribe with an @gmail.com adress. If anyone has either an explanation for the strange .dot() behavior or my inability to subscribe to that mailing list please give me a hint ;)

Upvotes: 1

Views: 792

Answers (1)

fabmilo
fabmilo

Reputation: 48330

One reason would be that Pycuda is compiling the kernel before uploading it. As far as I remember thought that should happen only the very first time it executes it.

One solution could be to "warm up" the kernel by executing it once and then start the profiling procedure.

Upvotes: 2

Related Questions