Reputation: 133
I have a question in spliting dataset in python if I have got a subset of dataset as training, is there some function in python can exclude training set from dataset and get rest of dataset directly? just like:
testing set = numpy.exclude(dataset , trainingset)
for example, there are 10 row in dataset, I have took 2,4,7,9 row as training set, so how can I get rest of dataset easily. In detail, these my training dataset
for i in range(0,5):
Test_data = dataset[ratio*i:ratio*(i+1),:]
Train_data = dataset[0:ratio*i&ratio*(i+1):-1,:]
My code didn't work because there is no & defination
Upvotes: 1
Views: 255
Reputation: 4680
If you already know the indices of the training set rows, you can just exclude them to get the indices of the remaining rows:
training_rows_ix = [2,4,7,9]
non_training_rows = [i for i in dataset.index if i not in training_rows_ix]
test_set = dataset.loc[non_training_rows]
Or using set operations instead of list comprehension:
non_training_rows = sorted(set(dataset.index) - set(training_rows_ix))
Also for a more robust solution to splitting datasets into test-train look into scikit-learn's test-train-split
Upvotes: 2