aj3423
aj3423

Reputation: 2611

weird CUDA kernel result on old display driver

I'm writing a CUDA program that to be run on thousands of different GPUs, those machine would have different version of display driver installed, I cannot force them to update to the latest driver. Actually most code runs fine on those 'old' machine, but fails with some particular code:

Here's the problem:

#include <stdio.h>
#include <cuda.h>
#include <cuda_profiler_api.h>

__global__
void test()
{
    unsigned i = 64;
    unsigned j = 192;
    int k = 7;

    for(j = 1 << (k - 1); i &j; j >>= 1)
        i ^= j;
    i ^= j;

    printf("i,j,k: %d,%d,%d\n", i,j,k);
    // i,j,k: 32,32, 7  (correct)
    // i,j,k: 0, 64, 7  (wrong)
}

int main() {
    cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync);

    test<<<1,1>>>();
}

The code prints 32,32,7 as result on GPU with latest driver, which is the correct result. But on old driver(lower than CUDA 6.5) it prints 0,64,7 .

I'm looking for any workaround for this.

Envoronment:

  1. Developing: Win7-32bit, VS2013, CUDA 6.5
  2. Corrent Result on: WinXP-32bit(and Win7-32bit), GTX-650(latest driver)
  3. Wrong Result on: WinXP-32bit + GTX-750-Ti(old driver), WinXP-32bit + GTX-750(old driver)

Upvotes: 0

Views: 54

Answers (2)

talonmies
talonmies

Reputation: 72351

There is no workaround. The runtime API is versioned and the minimum driver version requirement is non-negotiable.

Your only two choices are to develop using the lowest common denominator toolkit version that supports the driver being used, or switch to the driver API.

Upvotes: 2

aj3423
aj3423

Reputation: 2611

Got a very slow solution: use local memory rather than register variable. just add volatile keyword before i,j

volatile unsigned i = 64;
volatile unsigned j = 192;

Upvotes: 0

Related Questions