Hrushi
Hrushi

Reputation: 509

Google Colab not taking complete data from cifar10

from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import classification_report

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'

from tensorflow.keras.optimizers import SGD
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.keras.datasets import cifar10

print("[INFO] loading CIFAR-10 data")
((trainX, trainY), (testX, testY)) = cifar10.load_data()
trainX = trainX.astype("float") / 255.0
testX = testX.astype("float") / 255.0

print("trainX: {}, testX ={}".format(trainX.shape,testX.shape))

lb=LabelBinarizer()
# convert the labels from integers to vectors
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)

labelNames = ["airplane", "automobile", "bird", "cat", "deer",
"dog", "frog", "horse", "ship", "truck"]

print("[INFO] compiling model")
opt=SGD(lr=0.01, decay=0.01/40, momentum=0.9, nesterov=True)
model= MiniVGGNet.build(width=32,height=32,depth=3, classes=10)
model.compile(loss="categorical_crossentropy",
        optimizer=opt,metrics=["accuracy"])

#train the network

print("[INFO] training network..")
H=model.fit(trainX, trainY, validation_data=(testX, testY),
        batch_size=64, epochs=40, verbose=1)

The output is:

[INFO] loading CIFAR-10 data
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 4s 0us/step
trainX: (50000, 32, 32, 3), testX =(10000, 32, 32, 3)
[INFO] compiling model
[INFO] training network..
Epoch 1/40
782/782 [==============================] - 10s 12ms/step - loss: 1.6249 - accuracy: 0.4555 - val_loss: 1.3396 - val_accuracy: 0.5357
Epoch 2/40
782/782 [==============================] - 9s 12ms/step - loss: 1.1462

When I download the data from the website above, I get the correct cifar data but when I run my model, I can see it only takes 782 images. I have worked on other models as well but same result. This only happens in google colab and not in my local pc. What am I missing?

Upvotes: 0

Views: 1915

Answers (1)

knoop
knoop

Reputation: 600

Both the training and testing sets are working perfectly fine. Train set has 50000 images and the test set has 10000. So, there is no problem in the code that you posted. Consider adding rest of the code that you used to train the model. You can check the shape of your sets by executing.

from tensorflow.keras.datasets import cifar10
(train_X, train_y), (test_X, test_y) = cifar10.load_data()
train_X = train_X.astype("float") / 255.0
test_X = test_X.astype("float") / 255.0

print(f"train_X: {train_X.shape}, test_X = {test_X.shape}")



perfectly fine shape of data

Update:

Tested this in MyBinder, my local Juyter Notebook and Colab and got to this conclusion:

MyBinder and local Notebook didn't separate CIFAR training set into mini-batches or just showed total number individual data-points in training set. So, they show showed that 50000 steps were necessary at each epoch.

On contrary, Google Colab has mini-batched the CIFAR dataset into 64 mini-batches and has trained the model. So the total steps at each epoch is 5000/64 which is equal to 782.

Mini Batch

Hope this cleared your confusion. It was just that Colab displayed total mini-batches, whereas Jupyter Notebook showed total invidual number of entities in the set.

PS: You might want to add missing bracket at the end of line 34 in the code that you shared here.

Upvotes: 1

Related Questions