How does printf work on CUDA compute = 2

Question

In the earlier days printf was not supported and we would either run CUDA programs using emulator or would copy back and forth the variable and print on host side.

Now that CUDA (arch 2 and greater) support printf I am currious to know how this work? I mean how internally the GPU printfs on the screen? What was the limiting factor in compute capability 1?

Albert · Accepted Answer

From the CUDA C Programming Guide:

printf prints formatted output from a kernel to a host-side output stream.

The output buffer for printf() is set to a fixed size before kernel launch (see Associated Host-Side API). It is circular and if more output is produced during kernel execution than can fit in the buffer, older output is overwritten. It is flushed only when one of these actions is performed:

...

Internally printf() uses a shared data structure and so it is possible that calling printf() might change the order of execution of threads. In particular, a thread which calls printf() might take a longer execution path than one which does not call printf(), and that path length is dependent upon the parameters of the printf(). Note, however, that CUDA makes no guarantees of thread execution order except at explicit __syncthreads() barriers, so it is impossible to tell whether execution order has been modified by printf() or by other scheduling behaviour in the hardware.

The following API functions get and set the size of the buffer used to transfer the printf() arguments and internal metadata to the host (default is 1 megabyte):

cudaDeviceGetLimit(size_t* size,cudaLimitPrintfFifoSize)

cudaDeviceSetLimit(cudaLimitPrintfFifoSize, size_t size)

How does printf work on CUDA compute >= 2

Answers (1)

Related Questions

How does printf work on CUDA compute &gt;= 2

Answers (1)

Related Questions

How does printf work on CUDA compute >= 2