Reputation: 2170
I have a very simple question related to transfer learning and the VGG16 NN.
Here is my code:
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dropout, Flatten, Dense
from keras import applications
from keras import optimizers
from keras.applications.vgg16 import VGG16
from keras.applications.vgg16 import preprocess_input
img_width, img_height = 150, 150
top_model_weights_path = 'full_hrct_model_weights.h5'
train_dir = 'hrct_data_small/train'
validation_dir = 'hrct_data_small/validation'
nb_train_samples = 3000
nb_validation_samples = 600
epochs = 50
batch_size = 20
def save_bottleneck_features():
datagen = ImageDataGenerator(rescale=1. / 255)
# build the vgg16 model
model = applications.VGG16(include_top=False, weights='imagenet')
generator = datagen.flow_from_directory(
train_dir,
target_size=(img_width, img_height),
shuffle=False,
class_mode=None,
batch_size=batch_size
)
bottleneck_features_train = model.predict_generator(generator=generator, steps=nb_train_samples // batch_size)
np.save(file="bottleneck_features_train_ternary_class.npy", arr=bottleneck_features_train)
generator = datagen.flow_from_directory(
validation_dir,
target_size=(img_width, img_height),
shuffle=False,
class_mode=None,
batch_size=batch_size,
)
bottleneck_features_validation = model.predict_generator(generator, nb_validation_samples // batch_size)
np.save(file="bottleneck_features_validate_ternary_class.npy", arr=bottleneck_features_validation)
save_bottleneck_features()
"Found 3000 images belonging to 3 classes."
"Found 600 images belonging to 3 classes."
def train_top_model():
train_data = np.load(file="bottleneck_features_train_ternary_class.npy")
train_labels = np.array([0] * (nb_train_samples // 2) + [1] * (nb_train_samples // 2))
validation_data = np.load(file="bottleneck_features_validate_ternary_class.npy")
validation_labels = np.array([0] * (nb_validation_samples // 2) + [1] * (nb_validation_samples // 2))
model = Sequential()
model.add(Flatten(input_shape=train_data.shape[1:])) # don't need to tell batch size in input shape
model.add(Dense(256, activation='relu'))
model.add(Dense(3, activation='sigmoid'))
print(model.summary)
model.compile(optimizer='Adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_data, train_labels,
epochs=epochs,
batch_size=batch_size,
validation_data=(validation_data, validation_labels))
model.save_weights(top_model_weights_path)
train_top_model()
The error I get is this:
ValueError Traceback (most recent call last)
<ipython-input-52-33db5c28e162> in <module>()
2 epochs=epochs,
3 batch_size=batch_size,
----> 4 validation_data=(validation_data, validation_labels))
/Users/simonalice/anaconda/lib/python3.5/site-packages/keras/models.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, **kwargs)
854 class_weight=class_weight,
855 sample_weight=sample_weight,
--> 856 initial_epoch=initial_epoch)
857
858 def evaluate(self, x, y, batch_size=32, verbose=1,
/Users/simonalice/anaconda/lib/python3.5/site-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, **kwargs)
1427 class_weight=class_weight,
1428 check_batch_axis=False,
-> 1429 batch_size=batch_size)
1430 # Prepare validation data.
1431 if validation_data:
/Users/simonalice/anaconda/lib/python3.5/site-packages/keras/engine/training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_batch_axis, batch_size)
1307 output_shapes,
1308 check_batch_axis=False,
-> 1309 exception_prefix='target')
1310 sample_weights = _standardize_sample_weights(sample_weight,
1311 self._feed_output_names)
/Users/simonalice/anaconda/lib/python3.5/site-packages/keras/engine/training.py in _standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
137 ' to have shape ' + str(shapes[i]) +
138 ' but got array with shape ' +
--> 139 str(array.shape))
140 return arrays
141
ValueError: Error when checking target: expected dense_32 to have shape (None, 3) but got array with shape (3000, 1)
Here is the model summary:
Layer (type) Output Shape Param #
=================================================================
flatten_16 (Flatten) (None, 8192) 0
_________________________________________________________________
dense_31 (Dense) (None, 256) 2097408
_________________________________________________________________
dropout_16 (Dropout) (None, 256) 0
_________________________________________________________________
dense_32 (Dense) (None, 3) 771
=================================================================
Total params: 2,098,179
Trainable params: 2,098,179
Non-trainable params: 0
My difficulty here highlights a fundamental misunderstanding on my part I suspect but I need some very straightforward explanation. I have 3 classes in this training. 'hrct_data_small/train' contains 3 folders and 'hrct_data_small/validation' contains 3 folders.
First: Am I correct in thinking that the last layer of the top model:
model.add(Dense(3, activation='sigmoid'))
should be "3" as I have 3 classes.
Second:
I grabbed the data shapes to investigate
train_data = np.load(file="bottleneck_features_train_ternary_class.npy")
train_labels = np.array([0] * (nb_train_samples // 2) + [1] * (nb_train_samples // 2))
validation_data =np.load(file="bottleneck_features_validate_ternary_class.npy")
validation_labels = np.array([0] * (nb_validation_samples // 2) + [1] * (nb_validation_samples // 2))
Then
print("Train data shape", train_data.shape)
print("Train_labels shape", train_labels.shape)
print("Validation_data shape", validation_labels.shape)
print("Validation_labels", validation_labels.shape)
And the result is
Train data shape (3000, 4, 4, 512)
Train_labels shape (3000,)
Validation_data shape (600,)
Validation_labels (600,)
So, should the "Train data shape" variable be in the shape (3000, 3).
My apologies for these basic questions - if I can simply get some clear thinking on this I's be grateful.
EDIT: So thanks to the advice of Naseem below, I addressed all his points except:
The train_data was returning the training data in order, so the first class (1000) then the second (1000) and the the third (1000). Therefore the train_labels need to be in that order:
train_data = np.load(file="bottleneck_features_train_ternary_class.npy")
train_labels = np.array([0] * 1000 + [1] * 1000 + [2] * 1000)
validation_data = np.load(file="bottleneck_features_validate_ternary_class.npy")
validation_labels = np.array([0] * 400 + [1] * 400 + [2] * 400)
I then fixed the labels like so:
train_labels = np_utils.to_categorical(train_labels, 3)
validation_labels = np_utils.to_categorical(validation_labels, 3)
Which got the labels into the right shape and one-hot encoded them. I examined the first few and they were correct. The model then worked.
As an additional comment - All of the answers were in the Keras documentation. If I had spent a bit more time reading and less time cutting and pasting code I would have got it right. Lesson learned.
Upvotes: 2
Views: 1972
Reputation: 11543
I'm not sure this will clear everything in yout mind but here are some errors I see in your code :
The way you create your labels is really weird to me. Why do you put half of the data as 0 and the other half as 1 in :
train_labels = np.array([0] * (nb_train_samples // 2) + [1] * (nb_train_samples // 2))
this doesn't seem right. You need to explain a bit more what you are trying to predict. Normally your labels should be produced with the generator and set class_mode='categorical'
instead of class_mode=None
this will make the generator output both inputs and targets, and targets will be a serie of one hot encoded vectors of length 3.
The loss you are using is loss='binary_crossentropy'
. This is used when you are doing classification of images that can go into multiple categories, or when you have only 2 possibilities of class. This is not your case (if I understand correctly). You should use : loss='categorical_crossentropy'
. This is when each image has one and no more than one class as target.
This is linked to the previous point, the activation of the last layer : model.add(Dense(3, activation='sigmoid'))
. The sigmoid will allow your output to be [1 1 0] or [1 1 1] or [0 0 0], those are non valid as in your case you want to predict only one class, you don't want your image to be classified as belonging to the 3 classes. What we use for this classification case is a softmax. The softmax will normalize the outputs so that they sum up to 1. You can now interpret the output as probabilities : [0.1 0.2 0.7], the image has 10% probability to belong to the first class, 20% to the second class and 70% to the third. So I would change to : model.add(Dense(3, activation='softmax'))
So to summarize, the network complains because for each image, it expects the target that you provide to be a one-hot vector of length 3, encoding the class that the image belongs to. What you are currently feeding it is just one number 0 or 1.
Does it make more sense ?
Upvotes: 5