Reputation: 2611
I'm writing a CUDA program that to be run on thousands of different GPUs, those machine would have different version of display driver installed, I cannot force them to update to the latest driver. Actually most code runs fine on those 'old' machine, but fails with some particular code:
Here's the problem:
#include <stdio.h>
#include <cuda.h>
#include <cuda_profiler_api.h>
__global__
void test()
{
unsigned i = 64;
unsigned j = 192;
int k = 7;
for(j = 1 << (k - 1); i &j; j >>= 1)
i ^= j;
i ^= j;
printf("i,j,k: %d,%d,%d\n", i,j,k);
// i,j,k: 32,32, 7 (correct)
// i,j,k: 0, 64, 7 (wrong)
}
int main() {
cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync);
test<<<1,1>>>();
}
The code prints 32,32,7
as result on GPU with latest driver, which is the correct result. But on old driver(lower than CUDA 6.5) it prints 0,64,7
.
I'm looking for any workaround for this.
Envoronment:
Upvotes: 0
Views: 54
Reputation: 72351
There is no workaround. The runtime API is versioned and the minimum driver version requirement is non-negotiable.
Your only two choices are to develop using the lowest common denominator toolkit version that supports the driver being used, or switch to the driver API.
Upvotes: 2
Reputation: 2611
Got a very slow solution: use local memory rather than register variable. just add volatile keyword before i,j
volatile unsigned i = 64;
volatile unsigned j = 192;
Upvotes: 0