Why does my "Hello world" program take almost 10s?

Question

I have installed the CUDA runtime and drivers version 7.0 to my workstation (Ubuntu 14.04, 2xIntel XEON e5 + 4x Tesla k20m). I've used the following program to check whether my installation works:

#include 

__global__ void helloFromGPU()
{
    printf("Hello World from GPU!
");
}

int main(int argc, char **argv)
{
    printf("Hello World from CPU!
");

    helloFromGPU<<<1, 1>>>();

    printf("Hello World from CPU! Again!
");

    cudaDeviceSynchronize();

    printf("Hello World from CPU! Yet again!
");
    return 0;
}

I get the correct output, but it's taken an enourmus amount of time:

$ nvcc hello.cu -O2
$ time ./hello > /dev/null

real    0m8.897s
user    0m0.004s
sys     0m1.017s`

If I remove all device code the overall execution takes 0.001s. So why does my simple program almost take 10 seconds?

talonmies · Accepted Answer

The apparent slow runtime of your example is due to the underlying fixed cost of setting up the GPU context.

Because you are running on a platform that supports unified addressing, the CUDA runtime has to map 64GB of host RAM and 4 x 5120MB from your GPUs into a single virtual address space and register that with the Linux kernel.

There are a lot of kernel API calls required to do that, and it isn't fast. I would guess that is the main source of the slow performance you are observing. You should view this as a fixed start-up cost which must be amortised over the life of your application. In real world applications, a 10 second startup is trivial and of no real importance. In a hello world example, it isn't.

Why does my "Hello world" program take almost 10s?

Answers (1)

Related Questions

Why does my &quot;Hello world&quot; program take almost 10s?

Answers (1)

Related Questions

Why does my "Hello world" program take almost 10s?