Reputation: 317
I have a mnist dataset which I want to use 10% of it as a validation set. How can I do this in Tensorflow?
Upvotes: 0
Views: 258
Reputation: 262
You can use Dataset methods like dataset.take() and dataset.skip() to extract a part of the data and use it as you wish, for training, testing or validation.
Alternatively, you can use scikit learn to split the data, once into testing and (training + validation) data and then split the (training + validation) data again into training and validation separately.
import sklearn.model_selection as sk
X_train_val, X_test, y_train_val, y_test = sk.train_test_split(features,labels,test_size=0.5, random_state = 5)
X_train, X_val, y_train, y_val = sk.train_test_split(X_train_val,y_train_val,test_size=0.2, random_state = 5)
Remember to adjust the second split such that the test size is a desired % (20% of 50% of the total dataset is 10% ) of the complete dataset and not the train_val dataset
Upvotes: 1
Reputation: 2919
You should consider using downsampling your data ahead of time as validation data should be kept separate from the training data.
If you must sample your data in tensorflow, consider using the shuffle
in the tensorflow dataset object.
From the docs
shuffle shuffle( buffer_size, seed=None, reshuffle_each_iteration=None ) Randomly shuffles the elements of this dataset.
Args: buffer_size: A tf.int64 scalar tf.Tensor, representing the number of elements from this dataset from which the new dataset will sample.
Upvotes: 0