Reputation: 11910
Lets assume core1 and core2 try writing their variables a and b to same memory location.
How can UB be explained here?
Can I assume only the first option is valid for all vendors of CPU( and GPU)?
I just converted below code into a parallel GPU code and it seems to be working fine.
Generic code:
for (j=0; j<YRES/CELL; j++) // this is parallelized
for (i=0; i<XRES/CELL; i++) // this is parallelized
{
r = fire_r[j][i];
g = fire_g[j][i];
b = fire_b[j][i];
if (r || g || b)
for (y=-CELL; y<2*CELL; y++)
for (x=-CELL; x<2*CELL; x++)
addpixel(i*CELL+x, j*CELL+y, r, g, b, fire_alpha[y+CELL][x+CELL]);
//addpixel accesses neighbour cells' informations and writes on them
//and makes UB
r *= 8;
g *= 8;
b *= 8;
for (y=-1; y<2; y++)
for (x=-1; x<2; x++)
if ((x || y) && i+x>=0 && j+y>=0 && i+x<XRES/CELL && j+y<YRES/CELL)
{
r += fire_r[j+y][i+x];
g += fire_g[j+y][i+x];
b += fire_b[j+y][i+x];
}
r /= 16;
g /= 16;
b /= 16;
fire_r[j][i] = r>4 ? r-4 : 0; // UB
fire_g[j][i] = g>4 ? g-4 : 0; // UB
fire_b[j][i] = b>4 ? b-4 : 0;
}
Opencl:
" int i=get_global_id(0); int j=get_global_id(1);"
" int VIDXRES="+std::to_string(kkVIDXRES)+";"
" int VIDYRES="+std::to_string(kkVIDYRES)+";"
" int XRES="+std::to_string(kkXRES)+";"
" int CELL="+std::to_string(kkCELL)+";"
" int YRES="+std::to_string(kkYRES)+";"
" int x=0,y=0,r=0,g=0,b=0,nx=0,ny=0;"
" r = fire_r[j*(XRES/CELL)+i];"
" g = fire_g[j*(XRES/CELL)+i];"
" b = fire_b[j*(XRES/CELL)+i];"
" int counterx=0;"
" if (r || g || b)"
" for (y=-CELL; y<2*CELL; y++){"
" for (x=-CELL; x<2*CELL; x++){"
" addpixel(i*CELL+x, j*CELL+y, r, g, b, fire_alpha[(y+CELL)*(3*CELL)+(x+CELL)],vid,vido);"
" }}"
" r *= 8;"
" g *= 8;"
" b *= 8;"
" for (y=-1; y<2; y++){"
" for (x=-1; x<2; x++){"
" if ((x || y) && i+x>=0 && j+y>=0 && i+x<XRES/CELL && j+y<YRES/CELL)"
" {"
" r += fire_r[(j+y)*(XRES/CELL)+(i+x)];"
" g += fire_g[(j+y)*(XRES/CELL)+(i+x)];"
" b += fire_b[(j+y)*(XRES/CELL)+(i+x)];"
" }}}"
" r /= 16;"
" g /= 16;"
" b /= 16;"
" fire_r[j*(XRES/CELL)+i] = (r>4 ? r-4 : 0);"
" fire_g[j*(XRES/CELL)+i] = (g>4 ? g-4 : 0);"
" fire_b[j*(XRES/CELL)+i] = (b>4 ? b-4 : 0);"
Here is picture of some rare artifacts of a 2D NDrangeKernel 's local boundary UB. Can these kill my GPU?
Upvotes: 1
Views: 279
Reputation: 4926
On xf86 and xf86_64 architectures it means We dont know if a or b is written to that memory location(as a last action), because load/store operations of 32 (for both) or 64 bit (xf86_64 only) memory aligned datatypes are atomic.
On other architectures usually We dont even know what is written there (a garbage) is a valid answer - for sure on RISC architectures, I currently don't know on GPU's.
Note that The fact the code works doesn't imply that it is correct and in the 99% of the times it's the source of sentences like "there's a compiler bug, the code was working until the previous version" or "the code works on the development machine. The server selected for production is broken" :)
EDIT:
On NVidia GPUs we have weakly-ordered memory model. In the description on the Cuda C Programming guide it's not explicitly stated that store operations are atomic. The write operations come from the same thread, so it does not mean that load/store operations are atomic.
Upvotes: 4
Reputation: 4177
For the code above, IMHO the first option is the only possible one. Basically, if you assume that you have enough threads/processors to execute all the loops in parallel, the inner nested loops (the x
and y
ones) will have undetermined values.
For example, if we consider only the
r += fire_r[j+y][i+x];
section, the value at fire_r[j+y][i+x]
can be the original one just as well as the result of another instance of the same loop being finished in another thread.
Upvotes: 1