Adler Müller
Adler Müller

Reputation: 248

Create X_test, X_train, Y_test, Y_train in tensorflow

I have the following scheme:

training_filenames = filenames[split:]
validation_filenames = filenames[:split]

see: https://colab.research.google.com/github/GoogleCloudPlatform/training-data-analyst/blob/master/courses/fast-and-lean-data-science/04_Keras_Flowers_transfer_learning_solution.ipynb#scrollTo=M3G-2aUBQJ-H

Now I want to create x_train, y_train and x_test and y_test (for Hyperparameter Tuning). How is it done properly?

There are different classes available (CLASSES = ['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips'])

Upvotes: 0

Views: 1464

Answers (2)

Gilles Ottervanger
Gilles Ottervanger

Reputation: 671

To get the data as NumPy arrays, you should first load the data using the load_dataset() function from the example. This returns a tf.data.TFRecordDataset (doc). Looking at the display utilities code in your example, there is a function that does exactly what you want to do, except it only extracts the first N input-output pairs:

def dataset_to_numpy_util(dataset, N):
  dataset = dataset.batch(N)
  
  for images, labels in dataset:
    numpy_images = images.numpy()
    numpy_labels = labels.numpy()
    break;

  return numpy_images, numpy_labels

You should be able to do the following:

X_train, y_train = dataset_to_numpy_util(load_dataset(training_filenames), len(training_filenames))
X_test, y_test = dataset_to_numpy_util(load_dataset(testing_filenames), len(testing_filenames))

Upvotes: 0

Gilles Ottervanger
Gilles Ottervanger

Reputation: 671

It seems like the classes aren't that important. I assume each file contains an image and a label (it seems like it from the example you refer to). As I understand, you want not only train and validation data but train, validation and test data.

You should be able to do so like this:

# splits of your choice; this split leaves 10% of testing data
TRAINING_SPLIT = .7
VALIDATION_SPLIT = .2

training_filenames = filenames[:int(len(filenames) * TRAINING_SPLIT)]
validation_filenames = filenames[int(len(filenames) * TRAINING_SPLIT):int(len(filenames) * (TRAINING_SPLIT + VALIDATION_SPLIT))]
testing_filenames = filenames[int(len(filenames) * (TRAINING_SPLIT + VALIDATION_SPLIT)):]

Then you can proceed reading in the data as is done in the example. Once that is done, you should be able to extract the X (image) and y (label) data if needed.

Upvotes: 1

Related Questions