Reputation: 363
Final objective: Object Midpoint calculation.
I have a small dataset (around 120 images), which has an object (the same in all cases), and the labels are the normalized x,y coordinates of the midpoint of the object in the image (always between 0 and 1)
e.g. x = image_005 ; y = (0.1, 0.15) for an image with the object placed near the bottom left corner
I am trying to use a ResNet architecture but customized for my image-size (all are identical images). Since the output values are always between 0 and 1, for both coordinates, I was wondering if it is possible to use Sigmoid activation in my last layer:
X = Dense(2, activation='sigmoid', name='fc', kernel_initializer = glorot_uniform(seed=0))(X)
instead of a linear activation (as is advised often when you are trying to achieve a regression result)
For the loss function, I use MSE, with 'rmsprop' optimizer and in addition to accuracy and MSE, I have written a custom metric for telling me if the predicted points are off from the labels by more than 5%
model.compile(optimizer='rmsprop', loss='mean_squared_error', metrics=['mse','acc',perc_midpoint_err])
I am not getting good results, after training the model on around 150 epochs (I experimented with different batch sizes too)
Should I change the activation layer to linear? Or is there a different modification I can do to my model? Or is ResNet completely unsuitable for this task?
Upvotes: 1
Views: 2938
Reputation: 1633
Apart from what you have done, there are lots of other things that you can do:
Here in a simple implementation of Resnet:
def unit(x, filters, pool=False):
res = x
if pool:
x = MaxPooling2D(pool_size=(2, 2))(x)
res = Conv2D(filters=filters, kernel_size=(1, 1), strides=(2, 2), padding='same', kernel_initializer='he_normal')(res)
out = BatchNormalization()(x)
out = Activation('relu')(out)
out = Conv2D(filters=filters, kernel_size=(3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(out)
out = BatchNormalization()(out)
out = Activation('relu')(out)
out = Conv2D(filters=filters, kernel_size=(3, 3), strides=(1, 1), padding='same', kernel_initializer='he_normal')(out)
x = keras.layers.add([res, out])
return x
def model(inputs):
inp = Input(inputs)
x = Conv2D(32, (3, 3), padding='same', kernel_initializer='he_uniform')(inp)
x = unit(x, 32)
x = unit(x, 32)
x = unit(x, 32)
x = unit(x, 64, pool=True)
x = unit(x, 64)
x = unit(x, 64)
x = unit(x, 128, pool=True)
x = unit(x, 128)
x = unit(x, 128)
x = unit(x, 256, pool=True)
x = unit(x, 256)
x = unit(x, 256)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Dropout(0.25)(x)
x = AveragePooling2D((3, 3))(x)
x = Flatten()(x)
x = Dense(2, activation='sigmoid')(x)
model = Model(inputs=inp, outputs=x)
optimizer = Adam(lr=0.001)
# model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
model.compile(optimizer=optimizer, loss=keras.losses.MeanSquaredLogarithmicError(), metrics=['accuracy'])
return model
Upvotes: 1
Reputation: 213
Your task is related to object detection. The difference is, that you seem to have only one object in each of your images, whereas in detection there may be multiple objects or no object present. For object detection, there are networks such as YOLOv3 (https://pjreddie.com/media/files/papers/YOLOv3.pdf) or Single Shot Multibox Detector - SSD (https://arxiv.org/pdf/1512.02325.pdf) but also ResNet can be trained as an object detection network (as in this paper: https://arxiv.org/pdf/1506.01497.pdf)
I will shortly describe how YOLO solves the regression problem for bounding box x,y coordinates:
In principle your setup looks fine to me. But there are many things which could result in poor performance, since you don't tell about the domain of your dataset: Are you using a pretrained network or are you training from scratch? Is it a new category which you are to learning or an object category the network has seen before? etc.
Here are some ideas which you could try:
I hope you find some inspiration for your solution.
Upvotes: 1