Reputation: 248
I have the following scheme:
training_filenames = filenames[split:]
validation_filenames = filenames[:split]
Now I want to create x_train
, y_train
and x_test
and y_test
(for Hyperparameter Tuning
). How is it done properly?
There are different classes available (CLASSES = ['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']
)
Upvotes: 0
Views: 1464
Reputation: 671
To get the data as NumPy arrays, you should first load the data using the load_dataset()
function from the example. This returns a tf.data.TFRecordDataset
(doc). Looking at the display utilities
code in your example, there is a function that does exactly what you want to do, except it only extracts the first N
input-output pairs:
def dataset_to_numpy_util(dataset, N):
dataset = dataset.batch(N)
for images, labels in dataset:
numpy_images = images.numpy()
numpy_labels = labels.numpy()
break;
return numpy_images, numpy_labels
You should be able to do the following:
X_train, y_train = dataset_to_numpy_util(load_dataset(training_filenames), len(training_filenames))
X_test, y_test = dataset_to_numpy_util(load_dataset(testing_filenames), len(testing_filenames))
Upvotes: 0
Reputation: 671
It seems like the classes aren't that important. I assume each file contains an image and a label (it seems like it from the example you refer to). As I understand, you want not only train and validation data but train, validation and test data.
You should be able to do so like this:
# splits of your choice; this split leaves 10% of testing data
TRAINING_SPLIT = .7
VALIDATION_SPLIT = .2
training_filenames = filenames[:int(len(filenames) * TRAINING_SPLIT)]
validation_filenames = filenames[int(len(filenames) * TRAINING_SPLIT):int(len(filenames) * (TRAINING_SPLIT + VALIDATION_SPLIT))]
testing_filenames = filenames[int(len(filenames) * (TRAINING_SPLIT + VALIDATION_SPLIT)):]
Then you can proceed reading in the data as is done in the example. Once that is done, you should be able to extract the X (image) and y (label) data if needed.
Upvotes: 1