Shun7_natural
Shun7_natural

Reputation: 133

Is there a simple function can exclude training set from dataset in python?

I have a question in spliting dataset in python if I have got a subset of dataset as training, is there some function in python can exclude training set from dataset and get rest of dataset directly? just like:

testing set = numpy.exclude(dataset , trainingset)

for example, there are 10 row in dataset, I have took 2,4,7,9 row as training set, so how can I get rest of dataset easily. In detail, these my training dataset

for i in range(0,5):
  Test_data = dataset[ratio*i:ratio*(i+1),:]
  Train_data = dataset[0:ratio*i&ratio*(i+1):-1,:] 

My code didn't work because there is no & defination

Upvotes: 1

Views: 255

Answers (1)

Toby Petty
Toby Petty

Reputation: 4680

If you already know the indices of the training set rows, you can just exclude them to get the indices of the remaining rows:

training_rows_ix = [2,4,7,9]
non_training_rows = [i for i in dataset.index if i not in training_rows_ix]
test_set = dataset.loc[non_training_rows]

Or using set operations instead of list comprehension:

non_training_rows = sorted(set(dataset.index) - set(training_rows_ix))

Also for a more robust solution to splitting datasets into test-train look into scikit-learn's test-train-split

Upvotes: 2

Related Questions