Reputation: 915

Why does printf() work within a kernel, but using std::cout doesn't?

I have been exploring the field of parallel programming and have written basic kernels in Cuda and SYCL. I have encountered a situation where I had to print inside the kernel and I noticed that std::cout inside the kernel does not work whereas printf works. For example, consider the following SYCL Codes - This works -

void print(float*A, size_t N){
    buffer<float, 1> Buffer{A, {N}};
    queue Queue((intel_selector()));
    Queue.submit([&Buffer, N](handler& Handler){
       auto accessor = Buffer.get_access<access::mode::read>(Handler);
       Handler.parallel_for<dummyClass>(range<1>{N}, [accessor](id<1>idx){
           printf("%f", accessor[idx[0]]);
       });
    });
}

whereas if I replace the printf with std::cout<<accessor[idx[0]] it raises a compile time error saying - Accessing non-const global variable is not allowed within SYCL device code. A similar thing happens with CUDA kernels. This got me thinking that what may be the difference between printf and std::coout which causes such behavior.

Also suppose If I wanted to implement a custom print function to be called from the GPU, how should I do it?
TIA

Upvotes: 3

Answers (3)

einpoklum

Reputation: 131960

This got me thinking that what may be the difference between printf and std::cout which causes such behavior.

Yes, there is a difference. The printf() which runs in your kernel is not the standard C library printf(). A different call is made, to an on-device function (the code of of which is closed, if it at all exists in CUDA C). That function uses a hardware mechanism on NVIDIA GPUs - a buffer for kernel threads to print into, which gets sent back over to the host side, and the CUDA driver then forwards it to the standard output file descriptor of the process which launched the kernel.

std::cout does not get this sort of a compiler-assisted replacement/hijacking - and its code is simply irrelevant on the GPU.

A while ago, I implemented an std::cout-like mechanism for use in GPU kernels; see this answer of mine here on SO for more information and links. But - I decided I don't really like it, and it compilation is rather expensive, so instead, I adapted a printf()-family implementation for the GPU, which is now part of the cuda-kat library (development branch).

That means I've had to answer your second question for myself:

If I wanted to implement a custom print function to be called from the GPU, how should I do it?

Unless you have access to undisclosed NVIDIA internals - the only way to do this is to use printf() calls instead of C standard library or system calls on the host side. You essentially need to modularize your entire stream over the low-level primitive I/O facilities. It is far from trivial.

Upvotes: 7

Rod Burns

Reputation: 2244

In SYCL you cannot use std::cout for output on code not running on the host for similar reasons to those listed in the answer for CUDA code.

This means if you are running kernel code on the "device" (e.g. a GPU) then you need to use the stream class. There is more information about this in the SYCL developer guide section called Logging.

Upvotes: 5

Oblivion

Reputation: 7374

There is no __device__ version of std::cout, so only printf can be used in device code.

Upvotes: 0

Why does printf() work within a kernel, but using std::cout doesn&#39;t?

Answers (3)

Related Questions

Why does printf() work within a kernel, but using std::cout doesn't?