What might cause "Undefined Behaviour" in this parallel GPU code?

Question

Lets assume core1 and core2 try writing their variables a and b to same memory location.

How can UB be explained here?

We dont know if a or b is written to that memory location(as a last action).
We dont even know what is written there (a garbage)
Even the target memory address can be miscalculated(segfault?).
Some logical gates make wrong currents and CPU disables itself
CPU's frequency information becomes corrupt and goes high overclock(and break itself)

Can I assume only the first option is valid for all vendors of CPU( and GPU)?

I just converted below code into a parallel GPU code and it seems to be working fine.

Generic code:

for (j=0; j=0 && j+y>=0 && i+x4 ? r-4 : 0; // UB
            fire_g[j][i] = g>4 ? g-4 : 0; // UB
            fire_b[j][i] = b>4 ? b-4 : 0;
        }

Opencl:

"   int i=get_global_id(0); int j=get_global_id(1);"
"   int VIDXRES="+std::to_string(kkVIDXRES)+";"
                        "   int VIDYRES="+std::to_string(kkVIDYRES)+";"
                        "   int XRES="+std::to_string(kkXRES)+";"
                        "   int CELL="+std::to_string(kkCELL)+";"
                        "   int YRES="+std::to_string(kkYRES)+";"

                        "   int x=0,y=0,r=0,g=0,b=0,nx=0,ny=0;"

                        "       r = fire_r[j*(XRES/CELL)+i];"
                        "       g = fire_g[j*(XRES/CELL)+i];"
                        "       b = fire_b[j*(XRES/CELL)+i];"

                        "       int counterx=0;"
                        "       if (r || g || b)"
                        "       for (y=-CELL; y<2*CELL; y++){"
                        "       for (x=-CELL; x<2*CELL; x++){"
                        "       addpixel(i*CELL+x, j*CELL+y, r, g, b, fire_alpha[(y+CELL)*(3*CELL)+(x+CELL)],vid,vido);"
                        "       }}"

                        "       r *= 8;"
                        "       g *= 8;"
                        "       b *= 8;"
                        "       for (y=-1; y<2; y++){"
                        "       for (x=-1; x<2; x++){"
                        "       if ((x || y) && i+x>=0 && j+y>=0 && i+x4 ? r-4 : 0);"
                        "       fire_g[j*(XRES/CELL)+i] = (g>4 ? g-4 : 0);"
                        "       fire_b[j*(XRES/CELL)+i] = (b>4 ? b-4 : 0);"

Here is picture of some rare artifacts of a 2D NDrangeKernel 's local boundary UB. Can these kill my GPU?

enter image description here

Sigi · Accepted Answer

On xf86 and xf86_64 architectures it means We dont know if a or b is written to that memory location(as a last action), because load/store operations of 32 (for both) or 64 bit (xf86_64 only) memory aligned datatypes are atomic.

On other architectures usually We dont even know what is written there (a garbage) is a valid answer - for sure on RISC architectures, I currently don't know on GPU's.

Note that The fact the code works doesn't imply that it is correct and in the 99% of the times it's the source of sentences like "there's a compiler bug, the code was working until the previous version" or "the code works on the development machine. The server selected for production is broken" :)

EDIT:

On NVidia GPUs we have weakly-ordered memory model. In the description on the Cuda C Programming guide it's not explicitly stated that store operations are atomic. The write operations come from the same thread, so it does not mean that load/store operations are atomic.

What might cause "Undefined Behaviour" in this parallel GPU code?

Answers (2)

Related Questions

What might cause &quot;Undefined Behaviour&quot; in this parallel GPU code?

Answers (2)

Related Questions

What might cause "Undefined Behaviour" in this parallel GPU code?