Reputation: 173
I am writing an FFT in GLSL and compiling to MSL using glslang and SpirvCross to support macOS. The code works fine on Windows with GLSL, but on Metal I see output that makes no sense. However, I have worked through debugging the shaders using the Metal Shader debugger. The debugger shows each assignment to a variable with its new value. Here's the problem: The logic shown is outright wrong. I'm not sure if this is a driver error, or if it's just some nuanced logic error. The error occurs in the bit reversal stage, and I've gone ahead and done my best to create a minimum example.
GLSL
#define SIZE 8
#define LOG_SIZE int(log2(SIZE))
layout (binding = 0) buffer ssbo
{
vec2 data[SIZE];
};
layout (local_size_x = SIZE) in;
void main()
{
uint thread = uint(gl_LocalInvocationID.x);
uint number = thread;
uint reversed = 0;
for (uint i = 0; i < LOG_SIZE; i++)
{
reversed <<= 1;
reversed |= (number & 1);
number >>= 1;
}
data[thread] = vec2(reversed, thread);
}
Generated MSL
#include <metal_stdlib>
#include <simd/simd.h>
using namespace metal;
struct ssbo
{
float2 data[8];
};
constant uint3 gl_WorkGroupSize [[maybe_unused]] = uint3(8u, 1u, 1u);
kernel void main0(device ssbo& _49 [[buffer(0)]], uint3 gl_LocalInvocationID [[thread_position_in_threadgroup]])
{
uint thread0 = gl_LocalInvocationID.x;
uint number = thread0;
uint reversed = 0u;
for (uint i = 0u; i < 3u; i++)
{
reversed = reversed << uint(1);
reversed |= (number & 1u);
number = number >> uint(1);
}
_49.data[thread0] = float2(float(reversed), float(thread0));
}
The GLSL correctly produces the outputs of all of the inputs with the threads paired with their bit reversal. Fetching the data on the CPU after a complete GPU flush gives this output.
{ 0, 0 }, { 4, 1 }, { 2, 2 }, { 6, 3 }, { 1, 4 }, { 5, 5 }, { 3, 6 }, { 7, 7 }
Here is the output that MSL gives after the same GPU flush.
{ 0, 0 }, { 0, 1 }, { 0, 2 }, { 0, 3 }, { 2, 4 }, { 2, 5 }, { 2, 6 }, { 2, 7 }
I'm totally unsure why this happens. When I step through the GPU shader, I get this completely bogus step in the the third iteration of the loop. The reversed value goes to zero after boolean OR-ing it with the number. I have verified my algorithm on the CPU, so I'm totally clueless. Here is a screenshot of the Metal debugger.
What's even weirder is that certain changes within the metal debugger can fix this result. For instance, changing the code that assigns the thread index can cause all of the even values to be computed correctly. However, if I wrap all of the values that are out of range to the odd values, then none of the values are correct!
Works for evens
uint thread0 = gl_LocalInvocationID.x * 2;
if (thread0 >= 8)
return;
Fails for all!
uint thread0 = gl_LocalInvocationID.x * 2;
if (thread0 >= 8)
thread0 = (thread0 % 8) + 1;
I'm guessing this is some sort of driver bug, but it's extremely frustrating, and I am at a loss for approaches to debug this. I know this is a relatively involved question, so thank you to anyone who has any insight!
Upvotes: 1
Views: 46