user1261347
user1261347

Reputation: 325

How to run "host" functions on GPU with CUDA?

I'm going to run on GPU for example a strcmp function, but I get:

error: calling a host function("strcmp") from a __device__/__global__ function("myKernel") is not allowed

It's possible that printf won't work because gpu hasn't got stdout, but functions like strcmp are expected to work! So, I should insert in my code the implement of strcmp from the library with __device__ prefix or what?

Upvotes: 2

Views: 2778

Answers (2)

Prakash Dahal
Prakash Dahal

Reputation: 4875

Hope this will help atleast one person:

Since strcmp function is not available in CUDA, so we have to implement on our own:

__device__ int my_strcmp (const char * s1, const char * s2) {
    for(; *s1 == *s2; ++s1, ++s2)
        if(*s1 == 0)
            return 0;
    return *(unsigned char *)s1 < *(unsigned char *)s2 ? -1 : 1;
}

Upvotes: 0

harrism
harrism

Reputation: 27809

CUDA has a standard library, documented in the CUDA programming guide. It includes printf() for devices that support it (Compute Capability 2.0 and higher), as well as assert(). It does not include a complete string or stdio library at this point, however.

Implementing your own standard library as Jason R. Mick suggests may be possible, but it is not necessarily advisable. In some cases, it may be unsafe to naively port functions from the sequential standard library to CUDA -- not least because some of these implementations are not meant to be thread safe (rand() on Windows, for example). Even if it is safe, it might not be efficient -- and it might not really be what you need.

In my opinion, you are better off avoiding standard library functions in CUDA that are not officially supported. If you need the behavior of a standard library function in your parallel code, first consider whether you really need it: * Are you really going to do thousands of strcmp operations in parallel? * If not, do you have strings to compare that are many thousands of characters long? If so, consider a parallel string comparison algorithm instead.

If you determine that you really do need the behavior of the standard library function in your parallel CUDA code, then consider how you might implement it (safely and efficiently) in parallel.

Upvotes: 2

Related Questions