Ambareesh
Ambareesh

Reputation: 363

Can I use a Sigmoid activation for my output layer, even if my CNN model is doing a regression?

Final objective: Object Midpoint calculation.

I have a small dataset (around 120 images), which has an object (the same in all cases), and the labels are the normalized x,y coordinates of the midpoint of the object in the image (always between 0 and 1)

e.g. x = image_005 ; y = (0.1, 0.15) for an image with the object placed near the bottom left corner

I am trying to use a ResNet architecture but customized for my image-size (all are identical images). Since the output values are always between 0 and 1, for both coordinates, I was wondering if it is possible to use Sigmoid activation in my last layer:

 X = Dense(2, activation='sigmoid', name='fc', kernel_initializer = glorot_uniform(seed=0))(X)

instead of a linear activation (as is advised often when you are trying to achieve a regression result)

For the loss function, I use MSE, with 'rmsprop' optimizer and in addition to accuracy and MSE, I have written a custom metric for telling me if the predicted points are off from the labels by more than 5%

model.compile(optimizer='rmsprop', loss='mean_squared_error', metrics=['mse','acc',perc_midpoint_err])

I am not getting good results, after training the model on around 150 epochs (I experimented with different batch sizes too)

Should I change the activation layer to linear? Or is there a different modification I can do to my model? Or is ResNet completely unsuitable for this task?

Upvotes: 1

Views: 2938

Answers (2)

Rishab P
Rishab P

Reputation: 1633

Apart from what you have done, there are lots of other things that you can do:

  1. Use ImageAugmentation technique to generate more data. Also, normalize the images.
  2. Make a deeper model with a few more convolution layers.
  3. Use a proper weights initializer maybe He-normal for the convolution layers.
  4. Use BatchNormalization between layers to make the mean and std of your filter values equal to 0 and 1 respectively.
  5. Trying using crossentropy loss as it helps in better calculation of your gradients. In MSE the gradients become very small over time although it preferred for regression kind of problem. You can also try MeanSquaredLogarithmicError.
  6. Try to change the optimizer to Adam or Stochastic Gradient Descent with nestrov momentum (performs better than Adam on validation set).
  7. In case, you have a few more classes in your dataset, and you have class imbalance problem, you can use Focal loss, a variant of crossentropy loss which penalizes the misclassified labels more than the correctly classified labels. Also, reducing the batch size and Upsampling should help.
  8. Use Bayesian Optimization techniques for hyperparameter tuning of your model.

Here in a simple implementation of Resnet:

def unit(x, filters, pool=False):
    res = x
    if pool:
        x = MaxPooling2D(pool_size=(2, 2))(x)
        res = Conv2D(filters=filters, kernel_size=(1, 1), strides=(2, 2), padding='same', kernel_initializer='he_normal')(res)
    out = BatchNormalization()(x)
    out = Activation('relu')(out)
    out = Conv2D(filters=filters, kernel_size=(3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(out)

    out = BatchNormalization()(out)
    out = Activation('relu')(out)
    out = Conv2D(filters=filters, kernel_size=(3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(out)

    x = keras.layers.add([res, out])
    return x

def model(inputs):
    inp = Input(inputs)
    x = Conv2D(32, (3, 3), padding='same', kernel_initializer='he_uniform')(inp)
    x = unit(x, 32)
    x = unit(x, 32)
    x = unit(x, 32)

    x = unit(x, 64, pool=True)
    x = unit(x, 64)
    x = unit(x, 64)

    x = unit(x, 128, pool=True)
    x = unit(x, 128)
    x = unit(x, 128)

    x = unit(x, 256, pool=True)
    x = unit(x, 256)
    x = unit(x, 256)

    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Dropout(0.25)(x)

    x = AveragePooling2D((3, 3))(x)
    x = Flatten()(x)
    x = Dense(2, activation='sigmoid')(x)

    model = Model(inputs=inp, outputs=x)
    optimizer = Adam(lr=0.001)
    # model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
    model.compile(optimizer=optimizer, loss=keras.losses.MeanSquaredLogarithmicError(), metrics=['accuracy'])
    return model    

Upvotes: 1

brbr
brbr

Reputation: 213

Your task is related to object detection. The difference is, that you seem to have only one object in each of your images, whereas in detection there may be multiple objects or no object present. For object detection, there are networks such as YOLOv3 (https://pjreddie.com/media/files/papers/YOLOv3.pdf) or Single Shot Multibox Detector - SSD (https://arxiv.org/pdf/1512.02325.pdf) but also ResNet can be trained as an object detection network (as in this paper: https://arxiv.org/pdf/1506.01497.pdf)

I will shortly describe how YOLO solves the regression problem for bounding box x,y coordinates:

  • YOLO uses a sigmoid activation function for x,y
  • It devides the image into grid cells and predicts offsets for a potential object in each grid-cell. This may be helpful in case you have large images or objects at multiple locations.
  • The original paper uses MSE as a loss function, but in my favorite keras-reimplementation they use crossentropy loss with the Adam optimizer.

In principle your setup looks fine to me. But there are many things which could result in poor performance, since you don't tell about the domain of your dataset: Are you using a pretrained network or are you training from scratch? Is it a new category which you are to learning or an object category the network has seen before? etc.

Here are some ideas which you could try:

  • change the optimizer (to SGD or Adam)
  • change the learning rate (better smaller than too large)
  • increase your dataset size. For retraining the network for a new object category my rule of thumb is to use about 500-1000 images. For retraining from scratch you need orders of magnitude more.
  • you may want to check out YOLO or SSD and modify those networks for your case

I hope you find some inspiration for your solution.

Upvotes: 1

Related Questions