Fabio
Fabio

Reputation: 645

First CNN and shapes error

I just started to build my first CNN. I'm practicing with the MNIST dataset, this is the code I just wrote:

from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, Dropout, Flatten, Dense
from tensorflow.keras.losses import categorical_crossentropy
from tensorflow.keras.optimizers import Adam
from sklearn.preprocessing import RobustScaler
import os
import numpy as np
import matplotlib.pyplot as plt

# CONSTANTS
EPOCHS = 300
TIME_STEPS = 30000
NUM_CLASSES = 10

# Loading data
print('Loading data:')
(train_X, train_y), (test_X, test_y) = mnist.load_data()
print('X_train: ' + str(train_X.shape))
print('Y_train: ' + str(train_y.shape))
print('X_test:  ' + str(test_X.shape))
print('Y_test:  ' + str(test_y.shape))
print('------------------------------')

# Splitting train/val
print('Splitting training/validation set:')
X_train = train_X[0:TIME_STEPS, :]
X_val = train_X[TIME_STEPS:TIME_STEPS*2, :]
print('X_train: ' + str(X_train.shape))
print('X_val: ' + str(X_val.shape))


# Normalizing data
print('------------------------------')
print('Normalizing data:')
X_train = X_train/255
X_val = X_val/255
print('X_train: ' + str(X_train.shape))
print('X_val: ' + str(X_val.shape))


# Building model
model = Sequential()
model.add(Conv1D(filters=32, kernel_size=5, input_shape=(28, 28)))
model.add(Conv1D(filters=16, kernel_size=4, activation="relu"))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(NUM_CLASSES, activation='softmax'))

model.compile(optimizer=Adam(), loss=categorical_crossentropy, metrics=['accuracy'])
model.summary()

model.fit(x=X_train, y=X_train, batch_size=10, epochs=EPOCHS, shuffle=False)

I'm going to explain what I did, any correction would be helpful so I can learn more:

  1. The first thing I did is splitting the training set in two parts: a training part and a validation part, on which I would like to do the training before testing it on the test set.
  2. Then, I normalized the data (is this a standard when we work with images?)
  3. I then built my CNN with a simple structure: the first layer is the one which gets the inputs (with dimension 28x28) and I've chosen 32 filters that should be enough to perform well on this dataset. The kernel size is the one I did not understood since I thought that the kernel was the equivalent of the filter. I selected a low number to avoid problems. The second layer is similar to the previous one, but now it has an activation function (relu, but I'm not convinced, I was thinking to use a softmax to pass a set of probabilities to the full connected layer).
  4. The last 3 layers are the full connected layer to get the output.

In the fit function I used a batch size of 10 and I think that this could be one of the reason I get the error:

ValueError: Shapes (10, 28, 28) and (10, 10) are incompatible

Even removing it I still getting the following error:

ValueError: Shapes (None, 28, 28) and (None, 10) are incompatible

Am I missing something important?

Upvotes: 1

Views: 90

Answers (1)

Oxbowerce
Oxbowerce

Reputation: 430

You are passing in the X_train variable twice, once as the x argument and once as the y argument. Instead of passing in X_train as the y argument in .fit() you should pass in an array of values you are trying to predict. Given that you are using MNIST is assume that you are trying to predict the written digit, so your y array should be of shape (n_samples, 10) with the digit being one-hot encoded.

Upvotes: 1

Related Questions