Reputation: 1057
I am very new to machine learning and I started implementing a Siamese network to check the similarity level on handwritten digits, training with MNIST dataset but i am having a serious loss problem.
import keras
from keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Lambda
from keras.models import Sequential, Model
from keras.optimizers import Adam
import keras.backend as K
import cv2
from keras.datasets import mnist
import numpy as np
import random
def siameseNet(input_shape):
input1 = Input(input_shape)
input2 = Input(input_shape)
model = Sequential()
model.add(Conv2D(50, (5,5), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D())
model.add(Conv2D(100, (3,3), activation='relu'))
model.add(MaxPooling2D())
model.add(Conv2D(100, (3,3), activation='relu'))
model.add(Flatten())
model.add(Dense(2048, activation='sigmoid'))
input_model_1 = model(input1)
input_model_2 = model(input2)
distance_func = Lambda(lambda t: K.abs(t[0]-t[1]))
distance_layer = distance_func([input_model_1, input_model_2])
prediction = Dense(1,activation='sigmoid')(distance_layer)
network = Model(inputs=[input1,input2],outputs=prediction)
return network
My pairs
object is a numpy
array with two arrays, containing the images on the same index, first half of the array are the image of the same category, second half different category.
category
object is a simple array containing the same number of samples from the training set, first half of it beeing 0
to specify the Y value of beeing the same image, and second half is set to 1
.
Both pairs
and category
are populated in the following function:
INPUT_SHAPE = (28,28,1)
def loadData():
(X_train, Y_train), _ = mnist.load_data()
n_samples = 20000
arrPairs = [np.zeros((n_samples, INPUT_SHAPE[0], INPUT_SHAPE[1],INPUT_SHAPE[2])) for i in range(2)]
category = np.zeros((n_samples))
category[n_samples//2:] = 1
for i in range(n_samples):
if i%1000==0:
print(i)
cur_category = Y_train[i]
img = random.choice(X_train[Y_train==cur_category]).reshape(28,28,1)
_, img = cv2.threshold(img, .8, 1, cv2.THRESH_BINARY)
arrPairs[0][i] = img.reshape(28,28,1)
if category[i] == 1:
img = random.choice(X_train[Y_train!=cur_category])
else:
img = random.choice(X_train[Y_train==cur_category])
_, img = cv2.threshold(img, .8, 1, cv2.THRESH_BINARY)
arrPairs[1][i] = img.reshape(28,28,1)
arrPairs[0] = arrPairs[0]/255
return arrPairs, category
pairs, category = loadData()
model = siameseNet(INPUT_SHAPE)
model.compile(optimizer=Adam(lr=0.0005),loss="binary_crossentropy")
model.fit(pairs, category, epochs=5, verbose=1, validation_split=0.2)
Train on 16000 samples, validate on 4000 samples
Epoch 1/5
16000/16000 [==============================] - 6s 353us/step - loss: 0.6660 - val_loss: 0.9474
Epoch 2/5
16000/16000 [==============================] - 5s 287us/step - loss: 0.6628 - val_loss: 0.9335
Epoch 3/5
16000/16000 [==============================] - 5s 287us/step - loss: 0.6627 - val_loss: 0.8487
Epoch 4/5
16000/16000 [==============================] - 5s 287us/step - loss: 0.6625 - val_loss: 0.9954
Epoch 5/5
16000/16000 [==============================] - 5s 288us/step - loss: 0.6616 - val_loss: 0.9133
But no matter what I try, the loss won't decrease, thus, predicting incorrectly.
I tried changing the activations, increasing and decreasing the network complexity (adding and removing layers, as well as increasing and decreasing the Conv2D
parameters), but none of that worked, so i'm guessing it is an architectural problem that I am missing
Update: Lines used for testing:
test_pairs = [np.zeros((2, INPUT_SHAPE[0], INPUT_SHAPE[1],INPUT_SHAPE[2])) for i in range(2)]
test_pairs[0][0] = cv2.cvtColor(cv2.imread('test1_samenumber.png'), cv2.COLOR_BGR2GRAY).reshape(28,28,1);
test_pairs[1][0] = cv2.cvtColor(cv2.imread('test2_samenumber.png'), cv2.COLOR_BGR2GRAY).reshape(28,28,1);
pred = model.predict(test_pairs)
print(pred)
Which outputed:
[[0.32230237]
[0.44603676]]
Upvotes: 1
Views: 1433
Reputation: 104555
You have unnecessary normalization when loading in your data. Specifically, for the first pair of images, you are dividing by 255 when that isn't required. After you threshold with cv2.threshold
, the output values are inherently 0 or 1, so further dividing by 255 makes the dynamic range smaller than the second pair of images which may cause a problem in learning how to differentiate between two images. I've removed this normalization by commenting out the arrPairs[0] = arrPairs[0] / 255
statement.
After I trained your network, I ran through each of the pairs and examined the output prediction. Essentially, if the category is 1 and the prediction generated by the network (your sigmoid layer) is larger than 0.5, I count this as a correct prediction. Similarly, when I see that the category is 0 and the prediction generated is smaller than 0.5, this is also correct.
correct = 0
for i in range(len(pairs[0])):
output = model.predict([pairs[0][i][None], pairs[1][i][None]])[0][0]
if (category[i] == 1 and output >= 0.5) or (category[i] == 0 and output < 0.5):
correct += 1
print(correct / len(pairs[0]))
I get a 99.26% accuracy here, meaning that 0.74% out of 20000 samples, or about 148 samples were incorrectly classified. I'd say that's a good result.
Reproducible Google Colab notebook can be found here: https://colab.research.google.com/drive/10Q6rjuiytRSump2nulW5UhXY_PJh1eor
Upvotes: 1