Bibrak
Bibrak

Reputation: 554

How does printf work on CUDA compute >= 2

In the earlier days printf was not supported and we would either run CUDA programs using emulator or would copy back and forth the variable and print on host side.

Now that CUDA (arch 2 and greater) support printf I am currious to know how this work? I mean how internally the GPU printfs on the screen? What was the limiting factor in compute capability 1?

Upvotes: 2

Views: 1654

Answers (1)

Albert
Albert

Reputation: 68240

From the CUDA C Programming Guide:

printf prints formatted output from a kernel to a host-side output stream.

The output buffer for printf() is set to a fixed size before kernel launch (see Associated Host-Side API). It is circular and if more output is produced during kernel execution than can fit in the buffer, older output is overwritten. It is flushed only when one of these actions is performed:

...

Internally printf() uses a shared data structure and so it is possible that calling printf() might change the order of execution of threads. In particular, a thread which calls printf() might take a longer execution path than one which does not call printf(), and that path length is dependent upon the parameters of the printf(). Note, however, that CUDA makes no guarantees of thread execution order except at explicit __syncthreads() barriers, so it is impossible to tell whether execution order has been modified by printf() or by other scheduling behaviour in the hardware.

The following API functions get and set the size of the buffer used to transfer the printf() arguments and internal metadata to the host (default is 1 megabyte):

  • cudaDeviceGetLimit(size_t* size,cudaLimitPrintfFifoSize)
  • cudaDeviceSetLimit(cudaLimitPrintfFifoSize, size_t size)

Upvotes: 6

Related Questions