Autonomous
Autonomous

Reputation: 9075

How to implement superpixel pooling layer?

I want to implement the Superpixel pooling layer defined in the following paper "Weakly Supervised Semantic Segmentation Using Superpixel Pooling Network", originally implemented in Torch (implementation unavailable). I wish to do it in Keras with Theano backend (preferably).

I will give a small example to show what the layer does. It takes the following inputs:

feature_map: shape = (batch_size, height, width, feature_dim)

superpixel_map: shape = (batch_size, height, width)

Let us assume two small matrices with batch_size = 1, height = width = 2, feature_dim = 1

feature_map = np.array([[[[ 0.1], [ 0.2 ]], [[ 0.3], [ 0.4]]]])  
superpixel_map = np.array([[[ 0,  0], [ 1,  2]]])

Now, the output will be of the shape = (batch_size, n_superpixels, feature_dim). Here n_superpixels is basically = np.amax(superpixel_map) + 1.

The output is computed as follows.

Find the positions where superpixel_map == i, where i varies from 0 to n_superpixels - 1. Let's consider i = 0. The positions for i = 0 are (0, 0, 0) and (0, 0, 1)

Now average the elements at those positions in the feature map. This gives us the value (0.1 + 0.2) / 2 = 0.15. Do this for i = 1 and i = 2, that gives us the values 0.3 and 0.4 respectively.

Now, the problem is made complex because usually batch_size > 1 and height, width >> 1.

I implemented a new layer in Keras that basically does this but I used for loops. Now, if height = width = 32. Theano gives maximum recursion depth error. Anyone knows how this can be solved? If TensorFlow offers something new, then I am ready to switch to TensorFlow backend too.

The code for my new layer is as follows:

class SuperpixelPooling(Layer):
    def __init__(self, n_superpixels=None, n_features=None, batch_size=None, 
                 input_shapes=None, **kwargs):
        super(SuperpixelPooling, self).__init__(**kwargs)
        self.n_superpixels = n_superpixels
        self.n_features = n_features
        self.batch_size = batch_size
        self.input_shapes = input_shapes  # has to be a length-2 tuple, First tuple has the
                                          # shape of feature map and the next tuple has the
                                          # length of superpixel map. Shapes are of the
                                          # form (height, width, feature_dim)
    def compute_output_shape(self, input_shapes):
        return (input_shapes[0][0],
                    self.n_superpixels,
                    self.n_features)
    def call(self, inputs):
        # x = feature map
        # y = superpixel map, index from [0, n-1]
        x = inputs[0]  # batch_size x m x n x k
        y = inputs[1]  # batch_size x m x n
        ht = self.input_shapes[0][0]
        wd = self.input_shapes[0][1]
        z = K.zeros(shape=(self.batch_size, self.n_superpixels, self.n_features), 
                    dtype=float)
        count = K.zeros(shape=(self.batch_size, self.n_superpixels, self.n_features), 
                        dtype=int)
        for b in range(self.batch_size):
            for i in range(ht):
                for j in range(wd):
                    z = T.inc_subtensor(z[b, y[b, i, j], :], x[b, i, j, :])
                    count = T.inc_subtensor(count[b, y[b, i, j], :], 1)
        z /= count   
        return z

I think the recursion depth exceeded problem is due to the nested for loops I have used. I do not see a way of avoiding those loops. If anyone has any suggestions, let me know.

Cross-posted here. I will update this post if I get any answers there.

Upvotes: 5

Views: 1221

Answers (1)

Autonomous
Autonomous

Reputation: 9075

I have my initial implementation on my GitHub. It is still not ready to use. Read on for more details. For completeness, I will post the implementation and its brief explanation over here (basically sourced from the Readme file).

class SuperpixelPooling(Layer):
    def __init__(self, n_superpixels=None, n_features=None, batch_size=None, input_shapes=None, positions=None, superpixel_positions=None, superpixel_hist=None, **kwargs):
        super(SuperpixelPooling, self).__init__(**kwargs)

        # self.input_spec = InputSpec(ndim=4)
        self.n_superpixels = n_superpixels
        self.n_features = n_features
        self.batch_size = batch_size
        self.input_shapes = input_shapes  # has to be a length-2 tuple, First tuple has shape of feature map and the next tuple has 
                                          # length of superpixel map. Shapes are of the form (height, width, feature_dim)
        self.positions = positions  # has three columns
        self.superpixel_positions = superpixel_positions  # has two columns
        self.superpixel_hist = superpixel_hist  # is a vector
    def compute_output_shape(self, input_shapes):
        return (self.batch_size, self.n_superpixels, self.n_features)
    def call(self, inputs):
        # x = feature map
        # y = superpixel map, index from [0, n-1]
        x = inputs[0]  # batch_size x k x m x n
        y = inputs[1]  # batch_size x m x n
        ht = self.input_shapes[0][0]
        wd = self.input_shapes[0][1]
        z = K.zeros(shape=(self.batch_size, self.n_superpixels, self.n_features), dtype=float)
        z = T.inc_subtensor(z[self.superpixel_positions[:, 0], self.superpixel_positions[:, 1], :], x[self.positions[:, 0], :, self.positions[:, 1], self.positions[:, 2]])
        z /= self.superpixel_hist
        return z

Explanation:

Implementation of superpixel pooling layer in Keras. See keras.layers.pooling for implemenation.

The concept of superpixel pooling layer can be found in the paper: "Weakly Supervised Semantic Segmentation Using Superpixel Pooling Network", AAAI 2017. This layer takes two inputs, a superpixel map (size M x N) and a feature map (size K x M x N). It pools the features (in this implementation, average-pool) belonging to the same superpixel and forms a 1 x K vector where K is the feature map depth/channels.

A naive implementation will require three for loops: one iterating over batch, another over row and the last one iterating over columns of the feature map and pooling it on-the-fly. However, this gives "maximum recursion depth exceeded" error in Theano whenever you try to compile a model containing this layer. This error occurs even when the feature map width and height is only 32.

To overcome this problem, I thought that passing all the things as parameters to this layer will get rid of at least two for loops. Eventually, I was able to create a one-liner to implement the core of the entire average-pooling operation. You need to pass:

  1. Number of superpixels in the image
  2. Feature map depth/channels
  3. Batch size
  4. Shape of feature map and superpixel map
  5. An N x 3 matrix that contains all the possible combination of indices corresponding to (batch_size, row, column) called positions. This only needs to be generated once during training provided your input image size and batch size remains constant.
  6. An N x 2 matrix called superpixel_positions. The row i contains the superpixel index corresponding to the indices in the row i of matrix positions. For example, if row i of the matrix positions contains (12, 10, 20), then the same row of superpixel positions will contain (12, sp_i) where sp_i = superpixel_map[12, 10, 20].
  7. An N x S matrix - superpixel_hist - where S are the nubmer of superpixels in that image. As the name suggests, this matrix keeps a histogram of superpixels present in the current image.

The shortcoming of this implementation is that these parameters will have to be changed per image (specifically, parameters mentioned in points 6 and 7). This is impractical when GPU processes an entire batch at a time. I think this can be solved by passing all these parameters as inputs to the layer externally. Basically, they can be read from (say) HDF5 files. I plan to do that shortly. I will update this when that's done.

Upvotes: 4

Related Questions