Lleims
Lleims

Reputation: 1353

Using Cifar-10 dataset from tfds.load() correctly

I'm trying to use the Cifar-10 dataset to practice my CNN skills.

If I do this it's ok:

(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

But I was trying to use tfds.load() and I don't understand how to do it.

With this I download it,

train_ds, test_ds = tfds.load('cifar10', split=['train','test'])

Now I tried this but is not working,

assert isinstance(train_ds, tf.data.Dataset)
assert isinstance(test_ds, tf.data.Dataset)
(train_images, train_labels) = tuple(zip(*train_ds))
(test_images, test_labels) = tuple(zip(*test_ds))

Can somebody show me the way to achieve it?

thank you!

Upvotes: 3

Views: 5471

Answers (2)

Innat
Innat

Reputation: 17219

You can do this as follows.

import tensorflow as tf 
import tensorflow_datasets as tfds

train_ds, test_ds = tfds.load('cifar10', split=['train','test'], as_supervised=True)

These train_ds and test_ds are tf.data.Dataset objects and so you can use map, batch, and similar functions to each of those.

def normalize_resize(image, label):
    image = tf.cast(image, tf.float32)
    image = tf.divide(image, 255)
    image = tf.image.resize(image, (28, 28))
    return image, label

def augment(image, label):
    image = tf.image.random_flip_left_right(image)
    image = tf.image.random_saturation(image, 0.7, 1.3)
    image = tf.image.random_contrast(image, 0.8, 1.2)
    image = tf.image.random_brightness(image, 0.1)
    return image, label 

train = train_ds.map(normalize_resize).cache().map(augment).shuffle(100).batch(64).repeat()
test = test_ds.map(normalize_resize).cache().batch(64)

Now, we can pass train and test directly to model.fit.

model = tf.keras.models.Sequential(
        [
            tf.keras.layers.Flatten(input_shape=(28, 28, 3)),
            tf.keras.layers.Dense(128, activation="relu"),
            tf.keras.layers.Dropout(0.2),
            tf.keras.layers.Dense(10, activation="softmax"),
        ]
    )

model.compile(
    optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)
model.fit(
    train,
    epochs=5,
    steps_per_epoch=60000 // 64,
    validation_data=test, verbose=2
)
Epoch 1/5
17s 17ms/step - loss: 2.0848 - accuracy: 0.2318 - val_loss: 1.8175 - val_accuracy: 0.3411
Epoch 2/5
11s 12ms/step - loss: 1.8827 - accuracy: 0.3144 - val_loss: 1.7800 - val_accuracy: 0.3595
Epoch 3/5
11s 12ms/step - loss: 1.8383 - accuracy: 0.3272 - val_loss: 1.7152 - val_accuracy: 0.3904
Epoch 4/5
11s 11ms/step - loss: 1.8129 - accuracy: 0.3397 - val_loss: 1.6908 - val_accuracy: 0.4060
Epoch 5/5
11s 11ms/step - loss: 1.8022 - accuracy: 0.3461 - val_loss: 1.6801 - val_accuracy: 0.4081

Upvotes: 6

Frightera
Frightera

Reputation: 5079

You can also extract them like this:

train_ds, test_ds = tfds.load('cifar10', split=['train','test'], 
                               as_supervised = True, 
                               batch_size = -1)

To work with as_numpy() method, you need pass as_supervised and batch_size as shown. If you pass as_supervised = True then the dataset will have tuple structure that (inputs, labels) otherwise it will be a dictionary.

With them you simply call:

train_images, train_labels = tfds.as_numpy(train_ds)

Or another way is to iterate over it to obtain classes(assuming batch_size is not passed).

With as_supervised = False:

train_images, train_labels = [],[]

for images_labels in train_ds:
    train_images.append(images_labels['image'])
    train_labels.append(images_labels['label'])

With as_supervised = True:

for images, labels in train_ds:
    train_images.append(images)
    train_labels.append(labels)

Upvotes: 2

Related Questions