Florin Lucaciu
Florin Lucaciu

Reputation: 91

Semantic Segmentation with Encoder-Decoder CNNs

Appologizes for misuse of technical terms. I am working on a project of semantic segmentation via CNNs ; trying to implement an architecture of type Encoder-Decoder, therefore output is the same size as the input.

How do you design the labels ? What loss function should one apply ? Especially in the situation of heavy class inbalance (but the ratio between the classes is variable from image to image).

The problem deals with two classes (objects of interest and background). I am using Keras with tensorflow backend.

So far, I am going with designing expected outputs to be the same dimensions as the input images, applying pixel-wise labeling. Final layer of model has either softmax activation (for 2 classes), or sigmoid activation ( to express probability that the pixels belong to the objects class). I am having trouble with designing a suitable objective function for such a task, of type:

function(y_pred,y_true),

in agreement with Keras.

Please,try to be specific with the dimensions of tensors involved (input/output of the model). Any thoughts and suggestions are much appreciated. Thank you !

Upvotes: 4

Views: 859

Answers (3)

Kevin Roy
Kevin Roy

Reputation: 111

Two ways :

  1. You could try 'flattening':

    model.add(Reshape(NUM_CLASSES,HEIGHT*WIDTH))  #shape : HEIGHT x WIDTH x NUM_CLASSES
    model.add(Permute(2,1)) # now itll be NUM_CLASSES x HEIGHT x WIDTH
    #Use some activation here- model.activation()
    #You can use Global averaging or Softmax
    
  2. One hot encoding every pixel:

    In this case your final layer should Upsample/Unpool/Deconvolve to HEIGHT x WIDTH x CLASSES. So your output is essentially of the shape: (HEIGHT,WIDTH,NUM_CLASSES).

Upvotes: 1

Thomas Pinetz
Thomas Pinetz

Reputation: 7148

I suggest starting with a base architecture used in practice like this one in nerve-segmentation: https://github.com/EdwardTyantov/ultrasound-nerve-segmentation. Here a dice_loss is used as a loss function. This works very well for a two class problem as has been shown in literature: https://arxiv.org/pdf/1608.04117.pdf.

Another loss function that has been widely used is cross entropy for such a problem. For problems like yours most commonly long and short skip connections are deployed to stabilize training as denoted in the paper above.

Upvotes: 1

Marcin Możejko
Marcin Możejko

Reputation: 40506

Actually when you use a TensorFlow backend you could simply apply a predefined Keras objectives in a following manner:

output = Convolution2D(number_of_classes, # 1 for binary case
                       filter_height,
                       filter_width,
                       activation = "softmax")(input_to_output) # or "sigmoid" for binary
... 
model.compile(loss = "categorical_crossentropy", ...) # or "binary_crossentropy" for binary

And then feed either a one-hot encoded feature map or matrix of shape (image_height, image_width) with integer encoded classes (remember than in this case you should use sparse_categorical_crossentropy as a loss).

To deal with a class inbalance (I guess it's beacuse of a backgroud class) I strongly recommend you to read carefully answers to this Stack Overflow question.

Upvotes: 1

Related Questions