Reputation: 27
I am trying to adapt Python code for a Convolutional Neural Network (in Keras) with 8 classes to work on 2 classes. My problem is that I get the following error message:
ValueError: Error when checking target: expected activation_6 to have shape(None,2) but got array with shape (5760,1).
My Model is as follows (without the indentation issues):
class MiniVGGNet:
@staticmethod
def build(width, height, depth, classes):
# initialize the model along with the input shape to be
# "channels last" and the channels dimension itself
model = Sequential()
inputShape = (height, width, depth)
chanDim = -1
# if we are using "channels first", update the input shape
# and channels dimension
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
# first CONV => RELU => CONV => RELU => POOL layer set
model.add(Conv2D(32, (3, 3), padding="same",
input_shape=inputShape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(32, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# second CONV => RELU => CONV => RELU => POOL layer set
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# first (and only) set of FC => RELU layers
model.add(Flatten())
model.add(Dense(512))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# softmax classifier
model.add(Dense(classes))
model.add(Activation("softmax"))
# return the constructed network architecture
return model
Where classes = 2, and inputShape=(32,32,3).
I know that my error has something to do with my classes/use of binary_crossentropy and occurs in the model.fit line below, but haven't been able to figure out why it is problematic, or how to fix it.
By changing model.add(Dense(classes)) above to model.add(Dense(classes-1)) I can get the model to train, but then my labels size and target_names are mismatched, and I have only one category which everything is categorized as.
# import the necessary packages
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from pyimagesearch.nn.conv import MiniVGGNet
from pyimagesearch.preprocessing import ImageToArrayPreprocessor
from pyimagesearch.preprocessing import SimplePreprocessor
from pyimagesearch.datasets import SimpleDatasetLoader
from keras.optimizers import SGD
#from keras.datasets import cifar10
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
help="path to input dataset")
ap.add_argument("-o", "--output", required=True,
help="path to the output loss/accuracy plot")
args = vars(ap.parse_args())
# grab the list of images that we'll be describing
print("[INFO] loading images...")
imagePaths = list(paths.list_images(args["dataset"]))
# initialize the image preprocessors
sp = SimplePreprocessor(32, 32)
iap = ImageToArrayPreprocessor()
# load the dataset from disk then scale the raw pixel intensities
# to the range [0, 1]
sdl = SimpleDatasetLoader(preprocessors=[sp, iap])
(data, labels) = sdl.load(imagePaths, verbose=500)
data = data.astype("float") / 255.0
# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels,
test_size=0.25, random_state=42)
# convert the labels from integers to vectors
trainY = LabelBinarizer().fit_transform(trainY)
testY = LabelBinarizer().fit_transform(testY)
# initialize the label names for the items dataset
labelNames = ["mint", "used"]
# initialize the optimizer and model
print("[INFO] compiling model...")
opt = SGD(lr=0.01, decay=0.01 / 10, momentum=0.9, nesterov=True)
model = MiniVGGNet.build(width=32, height=32, depth=3, classes=2)
model.compile(loss="binary_crossentropy", optimizer=opt,
metrics=["accuracy"])
# train the network
print("[INFO] training network...")
H = model.fit(trainX, trainY, validation_data=(testX, testY),
batch_size=64, epochs=10, verbose=1)
print ("Made it past training")
# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=64)
print(classification_report(testY.argmax(axis=1),
predictions.argmax(axis=1), target_names=labelNames))
# plot the training loss and accuracy
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, 10), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, 10), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, 10), H.history["acc"], label="train_acc")
plt.plot(np.arange(0, 10), H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy on items dataset")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend()
plt.savefig(args["output"])
I have looked at these questions already, but cannot workout how to get around this problem based on the responses.
Any advice or help would be much appreciated, as I've spent the last couple of days on this.
Upvotes: 0
Views: 637
Reputation: 27
Matt's comment was absolutely correct in that the problem lay with using LabelBinarizer and this hint led me to a solution that did not require me to give up using softmax, or change the last layer to have classes = 1. For posterity and for others, here's the section of code that I altered and how I was able to avoid LabelBinarizer:
from keras.utils import np_utils
from sklearn.preprocessing import LabelEncoder
# load the dataset from disk then scale the raw pixel intensities
# to the range [0,1]
sp = SimplePreprocessor (32, 32)
iap = ImageToArrayPreprocessor()
# encode the labels, converting them from strings to integers
le=LabelEncoder()
labels = le.fit_transform(labels)
data = data.astype("float") / 255.0
labels = np_utils.to_categorical(labels,2)
# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
....
Upvotes: 1
Reputation: 16394
I believe the problem lies in the use of LabelBinarizer
.
From this example:
>>> lb = preprocessing.LabelBinarizer()
>>> lb.fit_transform(['yes', 'no', 'no', 'yes'])
array([[1],
[0],
[0],
[1]])
I gather that the output of your transformation has the same format, i. e. a single 1
or 0
encoding "is new" or "is used".
If your problem only calls for classification among these two classes, that format is preferable because it contains all the information and uses less space than the alternative, i. e. [1,0], [0,1], [0,1], [1,0]
.
Therefore, using classes = 1
would be correct, and the output should be a float indicating the network's confidence in a sample being in the first class. Since these values have to sum to one, the probability of it being in the second class could easily be inferred by subtracting from 1.
You would need to replace softmax
with any other activation, because softmax on a single value always returns 1. I'm not completely sure about the behaviour of binary_crossentropy
with a single-valued result, and you may want to try mean_squared_error
as the loss.
If you are looking to expand your model to cover more than two classes, you would want to convert your target vector to a One-hot encoding. I believe inverse_transform
from LabelBinarizer
would do this, although that would seem to be quite a roundabout way to get there. I see that sklearn also has OneHotEncoder
which may the more appropriate replacement.
NB: You can specify the activation function for any layer more easily with, for example:
Dense(36, activation='relu')
This may be helpful in keeping your code to a manageable size.
Upvotes: 0