Reputation:
There are 250 randomly generated data points that are obtained as follows:
[X, y] = getDataSet() # getDataSet() randomly generates 250 data points
X looks like:
[array([[-2.44141527e-01, 8.39016956e-01],
[ 1.37468561e+00, 4.97114860e-01],
[ 3.08071887e-02, -2.03260255e-01],...
While y looks like:
y is array([[0.],
[0.],
[0.],...
(it also contains 1s)
So, I'm trying to split [X, y] into training and testing sets. The training set is suppose to be a random selection of 120 of the randomly generated data points. Here is how I'm generating the training set:
nTrain = 120
maxIndex = len(X)
randomTrainingSamples = np.random.choice(maxIndex, nTrain, replace=False)
trainX = X[randomTrainingSamples, :] # training samples
trainY = y[randomTrainingSamples, :] # labels of training samples nTrain X 1
Now, what I can't seem to figure out is, how to get the testing set, which is the 130 other randomly generated data points that are not included in the training set:
testX = # testing samples
testY = # labels of testing samples nTest x 1
Suggestions are much appreciated. Thank you!
Upvotes: 0
Views: 1364
Reputation: 36
You can try this.
randomTestingSamples = [i for i in range(maxIndex) if i not in randomTrainingSamples]
testX = X[randomTestingSamples, :] # testing samples
testY = y[randomTestingSamples, :] # labels of testing samples nTest x 1
Upvotes: 1
Reputation: 1564
You can shuffle the index and pick the first 120 as train and the next 130 as test
random_index = np.random.shuffle(np.arange(len(X)))
randomTrainingSamples = random_index[:120]
randomTestSamples = random_index[120:250]
trainX = X[randomTrainingSamples, :]
trainY = y[randomTrainingSamples, :]
testX = X[randomTestSamples, :]
testY = y[randomTestSamples, :]
Upvotes: 0
Reputation: 29742
You can use sklearn.model_selection.train_test_split
:
import numpy as np
from sklearn.model_selection import train_test_split
X, y = np.ndarray((250, 2)), np.ndarray((250, 1))
trainX, testX, trainY, testY = train_test_split(X, y, test_size= 130)
trainX.shape
# (120, 2)
testX.shape
# (130, 2)
trainY.shape
# (120, 1)
testY.shape
# (130, 1)
Upvotes: 3