DanielSon
DanielSon

Reputation: 1545

What is the structure of the data and labels in tensorflow.examples.tutorials.mnist input_data

I'm trying to learn to introduce data to conv nets properly in Tensorflow, and a majority of example code uses from import tensorflow.examples.tutorials.mnist import input_data.

It's simple when you can use this to access mnist data, but not helpful when trying to establish the equivalent way to structure and introduce non-mnist data to similar models.

What is the structure of the data being imported through the mnist examples, so that I can use example cnn walkthrough code and manipulate my data to mirror the structure of the mnist data?

Upvotes: 0

Views: 773

Answers (1)

mrry
mrry

Reputation: 126154

The format of the MNIST data obtained from that example code depends on exactly how you initialize the DataSet class. Calling DataSet.next_batch(batch_size) returns two NumPy arrays, representing batch_size images and labels respectively. They have the following formats.

  • If the DataSet was initialized with reshape=True (the default), the images array is a batch_size by 784 matrix, in which each row contains the pixels of one MNIST image. The default type is tf.float32, and the values are pixel intensities between 0.0 and 1.0.

  • If the DataSet was initialized with reshape=False, the images array is batch_size by 28 by 28 by 1 4-dimensional tensor. The 28 corresponds to the height and width of each image in pixels; the 1 corresponds to the number of channels in the images, which are grayscale and so have only a single channel.

  • If the DataSet was initialized with one_hot=False (the default), the labels array is a vector of length batch_size, in which each value is the label (an integer from 0 to 9) representing the digit in the respective image.

  • If the DataSet was initialized with one_hot=True, the labels array is a batch_size by 10 matrix, in which each row is all zeros, except for a 1 in the column that corresponds to the label of the respective image.

Note that if you are interested in convolutional networks, initializing the DataSet with reshape=False is probably what you want, since that will retain spatial information about the images that will be used by the convolutional operators.

Upvotes: 2

Related Questions