Reputation: 393
I just posted about another problem with the same code, but progress is extremely slow due to the fact that I know very little about what I'm doing. The link to the previous problem is here: Keras ValueError: No gradients provided for any variable
I'm currently trying to get my model to run in order to classify 5000 different events which are 2D numpy arrays of 29x29 values
I define my NN like so:
inputs = keras.Input(shape=(29,29,1))
x=inputs
x = keras.layers.Conv2D(16, kernel_size=(3,3), name='Conv_1')(x)
x = keras.layers.LeakyReLU(0.1)(x)
x = keras.layers.MaxPool2D((2,2), name='MaxPool_1')(x)
x = keras.layers.Conv2D(16, kernel_size=(3,3), name='Conv_2')(x)
x = keras.layers.LeakyReLU(0.1)(x)
x = keras.layers.MaxPool2D((2,2), name='MaxPool_2')(x)
x = keras.layers.Conv2D(32, kernel_size=(3,3), name='Conv_3')(x)
x = keras.layers.LeakyReLU(0.1)(x)
x = keras.layers.MaxPool2D((2,2), name='MaxPool_3')(x)
x = keras.layers.Flatten(name='Flatten')(x)
x = keras.layers.Dense(64, name='Dense_1')(x)
x = keras.layers.ReLU(name='ReLU_dense_1')(x)
x = keras.layers.Dense(64, name='Dense_2')(x)
x = keras.layers.ReLU(name='ReLU_dense_2')(x)
outputs = keras.layers.Dense(4, activation='softmax', name='Output')(x)
model = keras.Model(inputs=inputs, outputs=outputs, name='VGGlike_CNN')
model.summary()
keras.utils.plot_model(model, show_shapes=True)
OPTIMIZER = tf.keras.optimizers.Adam(learning_rate=LR_ST)
model.compile(optimizer=OPTIMIZER,
loss='categorical_crossentropy',
metrics=['accuracy'],
run_eagerly=False)
def lr_decay(epoch):
if epoch < 10:
return LR_ST
else:
return LR_ST * tf.math.exp(0.2 * (10 - epoch))
lr_scheduler = keras.callbacks.LearningRateScheduler(lr_decay)
model_checkpoint = keras.callbacks.ModelCheckpoint(
filepath='mycnn_best',
monitor='val_accuracy',
save_weights_only=True,
save_best_only=True,
save_freq='epoch')
callbacks = [ lr_scheduler, model_checkpoint ]
print('X_train.shape = ',X_train.shape)
history = model.fit(X_train, Y_train epochs=50,
validation_data=X_test, shuffle=True, verbose=1,
callbacks=callbacks)
It now gives me the error: ValueError: Shapes (32, 2) and (32, 4) are incompatible.
I want to classify each of the events has having 1,2,3 or 4 clusters, but before working on something complex, I'm using events which I know only have 1 cluster, so the label for each event is 1.
All of this gives me the idea that the problem is to do with my output being 4 neurons, but I really don't know if that's true, nor do I know how to go about debugging the code.
If anyone could help me I'd be really grateful.
Upvotes: 4
Views: 5635
Reputation: 3294
Your model.summary() clearly shows what your model expects at each step. In particular, your model expects data that looks like (batch_size,29,29,1) and outputs data that looks like (batch_size,4). If your label data is not of the form (batch_size,4) then that will create an error if those are to be compared by the loss function.
It is unclear how you are labeling your data. You say some data is labeled as a "1". Not sure what that means. Data can be label in one of two ways:
A -> 0
B -> 1
C -> 2
D -> 1
or
A -> (1,0,0)
B -> (0,1,0)
C -> (0,0,1)
D -> (0,1,0)
The second set is known as "one hot encoding" your labels. Keras will automatically one hot encode your labels if you use the "sparse_categorical_crossentropy" loss function, and your labels are integers (starting at 0). If you manually one hot encode your data yourself, then you would use the "categorical_crossentropy" loss function instead (it sounds like you may have done this).
The difference between these labeling methods comes from the fact that labels as integers unavoidably place your labels on a continuum. For example, the label of "2" is closer to a label of "1" than a label of "0". Hence, the algorithm will natural not think a label of "2" (for a "1") is as bad of a guess as a label of "2" (for a "0"). By "one hot encoding" all the data, we have made all the labels an equal distance apart from each other, so there is no favoritism in the labels.
I always recommend testing your model on random data to make sure everything is working. For example,
import numpy as np
from tensorflow import keras
X = np.random.random((10000,29,29,1))
Y = np.random.randint(0,4,size=10000)
Y = keras.utils.to_categorical(Y)
print(f"Input: {X.shape}, Output: {Y.shape}")
model = keras.models.Sequential([
keras.layers.Conv2D(16,(3,3),activation=keras.layers.LeakyReLU(0.1)),
keras.layers.MaxPool2D((2,2)),
keras.layers.Conv2D(16,(3,3),activation=keras.layers.LeakyReLU(0.1)),
keras.layers.MaxPool2D((2,2)),
keras.layers.Conv2D(32,(3,3),activation=keras.layers.LeakyReLU(0.1)),
keras.layers.MaxPool2D((2,2)),
keras.layers.Flatten(),
keras.layers.Dense(64,'relu'),
keras.layers.Dense(64,'relu'),
keras.layers.Dense(4,'softmax'),
])
model.compile('adam','categorical_crossentropy')
model.fit(X,Y)
Upvotes: 1
Reputation: 1134
The issue comes from the difference between the shape of your labels and the output shape of your model. Since you are using categorical_crossentropy
and there are 4 units for your output layer, your model expects labels in one hot encoded form and as a vector of length 4. However, your labels are vectors of length 2. Therefore, if your labels are integers, you can do
Y_train = tf.one_hot(Y_train, 4)
and the resulting shape will be (5000, 4)
.
Upvotes: 7