daydreamer
daydreamer

Reputation: 91949

create muli-splits of datasets using one command of train_test_split

What I do is

# split data into training, cv and test sets
from sklearn import cross_validation
train, intermediate_set = cross_validation.train_test_split(input_set, train_size=0.6, test_size=0.4)
cv, test = cross_validation.train_test_split(intermediate_set, train_size=0.5, test_size=0.5)


# preparing the training dataset
print 'training shape(Tuple of array dimensions) = ', train.shape
print 'training dimension(Number of array dimensions) = ', train.ndim
print 'cv shape(Tuple of array dimensions) = ', cv.shape
print 'cv dimension(Number of array dimensions) = ', cv.ndim
print 'test shape(Tuple of array dimensions) = ', test.shape
print 'test dimension(Number of array dimensions) = ', test.ndim

and gets me the result of

training shape(Tuple of array dimensions) =  (25200, 785)
training dimension(Number of array dimensions) =  2
cv shape(Tuple of array dimensions) =  (8400, 785)
cv dimension(Number of array dimensions) =  2
test shape(Tuple of array dimensions) =  (8400, 785)
test dimension(Number of array dimensions) =  2
features shape =  (25200, 784)
labels shape =  (25200,)

How can I make this work in one command?

Upvotes: 2

Views: 3188

Answers (1)

ogrisel
ogrisel

Reputation: 40149

Read the source code of train_test_split and its companion class ShuffleSplit and adapt it to your use case. It's not a big function, it should not be very complicated.

Upvotes: 1

Related Questions