Reputation: 11
I don't understand the behavior I observe using printf in a CUDA kernel. Can someone shed some light on this ? If this is normal why is that ? Is there a way to make sure I printf data before they are modified inside the kernel (debugging) ?
Here is the code :
~>more *
::::::::::::::
Makefile
::::::::::::::
all:
nvcc -o WTF.cu.o -arch=sm_21 -c WTF.cu
g++ -o WTF.exe -I/usr/local/cuda/include WTF.cpp WTF.cu.o -L/usr/local/cuda/lib64 -lcuda -lcudart
::::::::::::::
WTF.cpp
::::::::::::::
#include <iostream> // cout
#include <cstdlib> // rand, srand
#include <cuda_runtime_api.h> // cudaXXX
void PrintOnGPU ( unsigned int const iDataSize, int * const iopData );
using namespace std;
int main ()
{
// Allocate and initialize CPU data
unsigned int dataSize = 4;
srand ( time ( NULL ) ); // Random seed
int * pCPUData = ( int * ) malloc ( sizeof ( int ) * dataSize );
for ( unsigned int i = 0; i < dataSize; i++ ) { pCPUData[i] = rand () % 100; cout << "CPU : " << pCPUData[i] << endl; }
// Print from GPU
int * pGPUData = NULL;
cudaMalloc ( ( void ** ) &pGPUData, dataSize * sizeof ( int ) );
cudaMemcpy ( pGPUData, pCPUData, dataSize * sizeof ( int ), cudaMemcpyHostToDevice );
PrintOnGPU ( dataSize, pGPUData );
// Get out
cudaFree ( pGPUData );
if ( pCPUData ) { free ( pCPUData ); pCPUData = NULL; }
return 0;
}
::::::::::::::
WTF.cu
::::::::::::::
#include "stdio.h"
__global__ void WTF ( unsigned int const iDataSize, int * const iopData )
{
if ( iDataSize == 0 || !iopData ) return;
// Don't modify : just print
unsigned long long int tIdx = blockIdx.x * blockDim.x + threadIdx.x; // 1D grid
if ( tIdx == 0 )
{
for ( unsigned int i = 0; i < iDataSize; i++ )
printf ( "GPU : %i \n", iopData[i] );
}
__syncthreads();
// Modify
// iopData[tIdx] = 666; // WTF ?...
}
void PrintOnGPU ( unsigned int const iDataSize, int * const iopData )
{
WTF<<<2,2>>> ( iDataSize, iopData );
}
And, as expected, I get no value above 100 (line 15 in cpp file : rand () % 100) :
~>make; ./WTF.exe
nvcc -o WTF.cu.o -arch=sm_21 -c WTF.cu
g++ -o WTF.exe -I/usr/local/cuda/include WTF.cpp WTF.cu.o -L/usr/local/cuda/lib64 -lcuda -lcudart
CPU : 38
CPU : 73
CPU : 28
CPU : 82
GPU : 38
GPU : 73
GPU : 28
GPU : 82
Now I uncomment line 17 in the cu file (iopData[tIdx] = 666) : I modify all values to 666 (that is above 100). As I have 4 data (dataSize = 4 in cpp file), a 2 X 2 grid and a __syncthreads () before data modification in the CUDA kernel, I should never printf any modified data, right ? However, I get this (print modified data with value 666) :
~>make; ./WTF.exe
nvcc -o WTF.cu.o -arch=sm_21 -c WTF.cu
g++ -o WTF.exe -I/usr/local/cuda/include WTF.cpp WTF.cu.o -L/usr/local/cuda/lib64 -lcuda -lcudart
CPU : 29
CPU : 72
CPU : 66
CPU : 90
GPU : 29
GPU : 72
GPU : 666
GPU : 666
I don't understand why these 666 appear : to me, they should not appear ?! If this behavior is normal, why is that ?
FH
Upvotes: 0
Views: 991
Reputation: 151799
It's because you are launching 2 threadblocks, and those threadblocks can execute in any order, simultaneously or sequentially.
Suppose you have the troublesome line uncommented. Now suppose threadblock 1 runs first and completes before threadblock 0. Then threadblock 0 runs. But threadblock 0 is doing the printing, and it is printing all 4 values. So the values previously set by threadblock 1 to 666 are printed out by threadblock 0.
This could not happen if threadblock 0 runs first, correspondingly my guess is you never see the first 2 GPU values listed as 666, only the last 2 (emanating from threadblock 1). You would also never see it if you were launching only 1 block, regardless of the number of threads (at least with the posted kernel code).
You may also be confused thinking that __syncthreads()
is a device-wide sync. It is not. It acts as a barrier for the threads in a threadblock only. There is no synchronization between separate threadblocks.
Upvotes: 3