Reputation: 51
I'm new at CUDA and OpenCL.
I have translated the kernels of a program from CUDA kernels to OpenCL kernels. I'm using the same seeds for the random number generation in both versions.
While the OpenCL version gets the exact same results every run, the CUDA version gives a slight different results every run.
I'm compiling the CUDA version without -use_fast_math
.
My device is 1.1 capability.
Any idea about what could be the reason?
Thanks in advance
Upvotes: 1
Views: 1054
Reputation: 51
I found the problem. In the original code, some values were updated asynchronously and was not completely updated yet. Thanks everybody for help. And sorry for the troubles.
Upvotes: 1
Reputation: 151849
Devices of compute capability 1.1 do not support double
operations. So if you are using double
they are getting demoted to float
. That could possibly affect your results, although a compute capability 1.1 device cannot support double
in OpenCL either, AFAIK.
My question actually is is there any CUDA compiling options that may affect the accuracy of the CUDA results.
Yes, there are a variety of options that affect CUDA's usage of floating point math
I don't know why any of this would lead to variation from one run to the next, however. It's likely that you have a bug in the code.
Upvotes: 1