Reputation: 5117
I am working with Python, Tensorflow and Keras to run an autoencoder on 450x450 rgb front-facing images of watches (e.g. watch_1). My goal is to use the encoded representation of these images which are generated by the autoencoder and compare these to find the most similar watches among them. For now, I am using 1500 rgb images as I do not have a GPU yet but only a pc with 26BG RAM.
My source code is the following:
from keras.layers import Input, Dense
from keras.models import Model
import cv2
import numpy as np
from sklearn import preprocessing
from glob import glob
import sys
data = []
number = 1500
i = 0
for filename in glob('Watches/*.jpg'):
img = cv2.imread(filename)
height, width, channels = img.shape
# Transpose images to one line
if height == 450 and width == 450:
img = np.concatenate(img, axis=0)
img = np.concatenate(img, axis=0)
data.append(img)
else:
print('These are not the correct dimensions')
i = i + 1
if i > number:
break
# Normalise data
data = np.array(data)
Norm = preprocessing.Normalizer()
Norm.fit(data)
data = Norm.transform(data)
# Size of our encoded representations
encoding_dim = 250
# Input placeholder
input_img = Input(shape=(width * height * channels,))
# Encoded representation of the input
encoded = Dense(encoding_dim, activation='relu')(input_img)
# Lossy reconstruction of the input
decoded = Dense(width * height * channels, activation='sigmoid')(encoded)
# Autoencoder model in all
autoencoder = Model(input_img, decoded)
# Compile the model
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy', metrics=['accuracy'])
print(autoencoder.summary())
# Train the model
length = len(data)
data_train = data[:int(0.7*length)]
data_test = data[(int(0.7*length) + 1):]
autoencoder.fit(data_train, data_train, epochs=10, batch_size=50, shuffle=True, validation_data=(data_test, data_test))
I am getting the following results in brief:
Epoch 1/10
loss: 0.6883 - acc: 0.0015 - val_loss: 0.6883 - val_acc: 0.0015
Epoch 2/10
loss: 0.6883 - acc: 0.0018 - val_loss: 0.6883 - val_acc: 0.0018
# I omit the other epochs for the sake of brevity
Epoch 10/10
loss: 0.6883 - acc: 0.0027 - val_loss: 0.6883 - val_acc: 0.0024
The accuracy is very low.
Is this because I use a relatively small number of images or because there is a problem with my source code?
If the problem is the number of images then how many images are needed to have accuracy > 80%?
Upvotes: 1
Views: 1856
Reputation: 10437
So I want to elaborate more on my answer after reading the blog post you commented. Your implementation is actually correct, but you don't want to evaluate the autoencoder by alone by itself.
Autoencoders are thought of as dimension reduction processes, so whatever output the autoencoder generates, is always going to be lossy. You can evaluate how well the autoencoder works by adding it as a layer to your neural network that actually does the classification. What happens is that the lossy representations become "input" to your subsequent neural network. In this subsequent neural network, you want to use the softmax
activation as the last layer. Then you can evaluate the accuracy of the NN.
Think of autoencoders as a preprocessing step for dimension reduction, similar to principal components analysis.
model = Sequential()
model.add(autoencoder.layers[1]) # here is where you add your autoencoder
model.add(Dense(10, activation='softmax')) # assumes 10 watch classes
model.compile(optimizer='adadelta', loss='categorical_crossentropy', metrics=['accuracy'])
Upvotes: 1