John M.
John M.

Reputation: 2712

Convert Tensorflow graph to CoreML

I'm trying to convert a Tensorflow graph to CoreML and I'm following this tutorial. There's this bit of code that I don't quite understand:

#include <metal_stdlib>
using namespace metal;

kernel void swish(
  texture2d_array<half, access::read> inTexture [[texture(0)]],
  texture2d_array<half, access::write> outTexture [[texture(1)]],
  ushort3 gid [[thread_position_in_grid]])
{
  if (gid.x >= outTexture.get_width() || 
      gid.y >= outTexture.get_height()) {
    return;
  }

  const float4 x = float4(inTexture.read(gid.xy, gid.z));
  const float4 y = x / (1.0f + exp(-x));             
  outTexture.write(half4(y), gid.xy, gid.z);
}

What I don't understand is the use of gid here. Isn't the grid 2 dimensional? What does gid.z signify? Isn't gid.x is the current x-coordinate of the current pixel?

Upvotes: 0

Views: 127

Answers (1)

Matthijs Hollemans
Matthijs Hollemans

Reputation: 7892

gid.x and gid.y are the x/y coordinate of the current pixel. So when you do a texture.read(gid.xy) it gives you 4 channels worth of pixel data.

But the "images" used in neural networks may have many more than 4 channels. That's why the data type for the textures is texture2d_array<> instead of just texture2d<>.

The gid.z value refers to the index of the texture "slice" in this array. If the image/tensor has 32 channels, then there are 8 texture slices (because each texture stores up to 4 channels of data).

So the grid really is three dimensional: (x, y, slice).

Upvotes: 1

Related Questions