FH_P
FH_P

Reputation: 11

CUDA : unexpected printf behavior

I don't understand the behavior I observe using printf in a CUDA kernel. Can someone shed some light on this ? If this is normal why is that ? Is there a way to make sure I printf data before they are modified inside the kernel (debugging) ?

Here is the code :

~>more *
::::::::::::::
Makefile
::::::::::::::
all:
    nvcc -o WTF.cu.o -arch=sm_21 -c WTF.cu
    g++ -o WTF.exe -I/usr/local/cuda/include WTF.cpp WTF.cu.o -L/usr/local/cuda/lib64 -lcuda -lcudart
::::::::::::::
WTF.cpp
::::::::::::::
#include <iostream> // cout
#include <cstdlib>  // rand, srand

#include <cuda_runtime_api.h> // cudaXXX
void PrintOnGPU ( unsigned int const iDataSize, int * const iopData );

using namespace std;

int main ()
{
  // Allocate and initialize CPU data
  unsigned int dataSize = 4;
  srand ( time ( NULL ) ); // Random seed
  int * pCPUData = ( int * ) malloc ( sizeof ( int ) * dataSize );
  for ( unsigned int i = 0; i < dataSize; i++ ) { pCPUData[i] = rand () % 100; cout << "CPU : " << pCPUData[i] << endl; }

  // Print from GPU
  int * pGPUData = NULL;
  cudaMalloc ( ( void ** ) &pGPUData, dataSize * sizeof ( int ) );
  cudaMemcpy ( pGPUData, pCPUData, dataSize * sizeof ( int ), cudaMemcpyHostToDevice );
  PrintOnGPU ( dataSize, pGPUData );

  // Get out
  cudaFree ( pGPUData );
  if ( pCPUData ) { free ( pCPUData ); pCPUData = NULL; }
  return 0;
}
::::::::::::::
WTF.cu
::::::::::::::
#include "stdio.h"

__global__ void WTF ( unsigned int const iDataSize, int * const iopData )
{
  if ( iDataSize == 0 || !iopData ) return;

  // Don't modify : just print
  unsigned long long int tIdx = blockIdx.x * blockDim.x + threadIdx.x; // 1D grid
  if ( tIdx == 0 )
  {
    for ( unsigned int i = 0; i < iDataSize; i++ )
      printf ( "GPU : %i \n", iopData[i] );
  }
  __syncthreads();

  // Modify
  // iopData[tIdx] = 666; // WTF ?...
}

void PrintOnGPU ( unsigned int const iDataSize, int * const iopData )
{
  WTF<<<2,2>>> ( iDataSize, iopData );
}

And, as expected, I get no value above 100 (line 15 in cpp file : rand () % 100) :

~>make; ./WTF.exe
nvcc -o WTF.cu.o -arch=sm_21 -c WTF.cu
g++ -o WTF.exe -I/usr/local/cuda/include WTF.cpp WTF.cu.o -L/usr/local/cuda/lib64 -lcuda -lcudart
CPU : 38
CPU : 73
CPU : 28
CPU : 82
GPU : 38 
GPU : 73 
GPU : 28 
GPU : 82 

Now I uncomment line 17 in the cu file (iopData[tIdx] = 666) : I modify all values to 666 (that is above 100). As I have 4 data (dataSize = 4 in cpp file), a 2 X 2 grid and a __syncthreads () before data modification in the CUDA kernel, I should never printf any modified data, right ? However, I get this (print modified data with value 666) :

 ~>make; ./WTF.exe
nvcc -o WTF.cu.o -arch=sm_21 -c WTF.cu
g++ -o WTF.exe -I/usr/local/cuda/include WTF.cpp WTF.cu.o -L/usr/local/cuda/lib64 -lcuda -lcudart
CPU : 29
CPU : 72
CPU : 66
CPU : 90
GPU : 29 
GPU : 72 
GPU : 666 
GPU : 666 

I don't understand why these 666 appear : to me, they should not appear ?! If this behavior is normal, why is that ?

FH

Upvotes: 0

Views: 991

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151799

It's because you are launching 2 threadblocks, and those threadblocks can execute in any order, simultaneously or sequentially.

Suppose you have the troublesome line uncommented. Now suppose threadblock 1 runs first and completes before threadblock 0. Then threadblock 0 runs. But threadblock 0 is doing the printing, and it is printing all 4 values. So the values previously set by threadblock 1 to 666 are printed out by threadblock 0.

This could not happen if threadblock 0 runs first, correspondingly my guess is you never see the first 2 GPU values listed as 666, only the last 2 (emanating from threadblock 1). You would also never see it if you were launching only 1 block, regardless of the number of threads (at least with the posted kernel code).

You may also be confused thinking that __syncthreads() is a device-wide sync. It is not. It acts as a barrier for the threads in a threadblock only. There is no synchronization between separate threadblocks.

Upvotes: 3

Related Questions