Multi GPU usage with CUDA Thrust

Question

I want to use my two graphic cards for calculation with CUDA Thrust.

I have two graphic cards. Running on single cards works well for both cards, even when I store two device_vectors in the std::vector.

If I use both cards at the same time, the first cycle in the loop works and causes no error. After the first run it causes an error, probably because the device pointer is not valid.

I am not sure what the exact problem is, or how to use both cards for calculation.

Minimal code sample:

std::vector > TEST() {
    std::vector > vRes;

    unsigned int iDeviceCount   = GetCudaDeviceCount();
    for(unsigned int i = 0; i < iDeviceCount; i++) {
        checkCudaErrors(cudaSetDevice(i) ); 
        thrust::host_vector hvConscience(1024);

                // first run works, runs afterwards cause errors ..
        vRes.push_back(hvConscience); // this push_back causes the error on exec

    }
    return vRes;
}

Error message on execution:

terminate called after throwing an instance of 'thrust::system::system_error'
what():  invalid argument

talonmies · Accepted Answer

The problem here is that you are trying to perform a device to device of copy data between a pair of device_vector which reside in different GPU contexts (because of the cudaSetDevice call). What you have perhaps overlooked is that this sequence of operations:

thrust::host_vector hvConscience(1024);
vRes.push_back(hvConscience);

is performing a copy from hvConscience at each loop iteration. The thrust backend is expecting that source and destination memory lie in the same GPU context. In this case they do not, thus the error.

What you probably want to do is work with a vector of pointers to device_vector instead, so something like:

typedef thrust::device_vector< float > vec;
typedef vec *p_vec;
std::vector< p_vec > vRes;

unsigned int iDeviceCount   = GetCudaDeviceCount();
for(unsigned int i = 0; i < iDeviceCount; i++) {
    cudaSetDevice(i); 
    p_vec hvConscience = new vec(1024);
    vRes.push_back(hvConscience);
}

[disclaimer: code written in browser, neither compiled nor tested, us at own risk]

This way you are only creating each vector once, in the correct GPU context, and then copy assigning a host pointer, which doesn't trigger any device side copies across memory spaces.

Multi GPU usage with CUDA Thrust

Answers (1)

Related Questions