prax1telis
prax1telis

Reputation: 317

How can I sample 10% of a dataset in Tensorflow?

I have a mnist dataset which I want to use 10% of it as a validation set. How can I do this in Tensorflow?

Upvotes: 0

Views: 258

Answers (2)

Akshay Kumar
Akshay Kumar

Reputation: 262

You can use Dataset methods like dataset.take() and dataset.skip() to extract a part of the data and use it as you wish, for training, testing or validation.

Alternatively, you can use scikit learn to split the data, once into testing and (training + validation) data and then split the (training + validation) data again into training and validation separately.

import sklearn.model_selection as sk

X_train_val, X_test, y_train_val, y_test = sk.train_test_split(features,labels,test_size=0.5, random_state = 5)

X_train, X_val, y_train, y_val = sk.train_test_split(X_train_val,y_train_val,test_size=0.2, random_state = 5)

Remember to adjust the second split such that the test size is a desired % (20% of 50% of the total dataset is 10% ) of the complete dataset and not the train_val dataset

Upvotes: 1

zglin
zglin

Reputation: 2919

You should consider using downsampling your data ahead of time as validation data should be kept separate from the training data.

If you must sample your data in tensorflow, consider using the shuffle in the tensorflow dataset object. From the docs

shuffle shuffle( buffer_size, seed=None, reshuffle_each_iteration=None ) Randomly shuffles the elements of this dataset.

Args: buffer_size: A tf.int64 scalar tf.Tensor, representing the number of elements from this dataset from which the new dataset will sample.

Upvotes: 0

Related Questions