Reputation: 2712
I'm trying to convert a Tensorflow graph to CoreML and I'm following this tutorial. There's this bit of code that I don't quite understand:
#include <metal_stdlib>
using namespace metal;
kernel void swish(
texture2d_array<half, access::read> inTexture [[texture(0)]],
texture2d_array<half, access::write> outTexture [[texture(1)]],
ushort3 gid [[thread_position_in_grid]])
{
if (gid.x >= outTexture.get_width() ||
gid.y >= outTexture.get_height()) {
return;
}
const float4 x = float4(inTexture.read(gid.xy, gid.z));
const float4 y = x / (1.0f + exp(-x));
outTexture.write(half4(y), gid.xy, gid.z);
}
What I don't understand is the use of gid
here. Isn't the grid 2 dimensional? What does gid.z
signify? Isn't gid.x
is the current x-coordinate of the current pixel?
Upvotes: 0
Views: 127
Reputation: 7892
gid.x
and gid.y
are the x/y coordinate of the current pixel. So when you do a texture.read(gid.xy)
it gives you 4 channels worth of pixel data.
But the "images" used in neural networks may have many more than 4 channels. That's why the data type for the textures is texture2d_array<>
instead of just texture2d<>
.
The gid.z
value refers to the index of the texture "slice" in this array. If the image/tensor has 32 channels, then there are 8 texture slices (because each texture stores up to 4 channels of data).
So the grid really is three dimensional: (x, y, slice).
Upvotes: 1