pycuda.gpuarray.dot() very slow at first call

Question

I have a working conjungate gradient method implementation in pycuda, that I want to optimize. It uses a self written matrix-vector-multiplication and the pycuda-native gpuarray.dot and gpuarray.mul_add functions

Profiling the program with kernprof.py/line_profiler returned most time (>60%) till convergence spend in one gpuarray.dot() call. (About .2 seconds) All following calls of gpuarray.dot() take about 7 microseconds. All calls have the same type of input vectors (size: 400 doubles)

Is there any reason why? I mean in the end it's just a constant, but it is making the profiling difficult. I wanted to ask the question at the pycuda mailing list. However I wasn't able to subscribe with an @gmail.com adress. If anyone has either an explanation for the strange .dot() behavior or my inability to subscribe to that mailing list please give me a hint ;)

pycuda.gpuarray.dot() very slow at first call

Answers (1)

Related Questions