Reputation: 377
I'm trying to write code for 10-fold cross validation.
That is, dividing the data into 10 equally sized chunks.Then, for each of 10 iterations, take out the ith chunk, and use the remaining 90% for test data.
For the second iteration, I need to join the first 10% with the last 80%. For the third iteration, I join the first 20% with the last 70%. Etc.
(So the first iteration, the first 10% is removed, the second iteration, the 2nd 10% is removed, etc.)
My data consists of 1000 items, each of which is an array of 70 values of type np.float
This loop will be called for each of the 10 validations, with i=0, i=1....i=9:
def get_training(input_array, i):
training = (input_array[:i*subset_size] + input_array[(i+1)*subset_size:])
return training
It worked earlier, but now I'm getting the error:
operands could not be broadcast together with shapes (100,70) (800,70)
I think this may be due to the np.float datatype; it was working earlier with another data type.
Thanks
Upvotes: 0
Views: 788
Reputation: 34027
Don't reinvent the wheel. You can use function KFold
and StratifiedKFold
in the sklearn.cross_validation
module.
See the docs:
K-Folds cross validation iterator.
Provides train/test indices to split data in train test sets. Split dataset into k consecutive folds (without shuffling).
Each fold is then used a validation set once while the k - 1 remaining fold form the training set.
Upvotes: 2
Reputation: 10794
Try:
training = np.concatenate((input_array[:i*subset_size], input_array[(i+1)*subset_size:]))
(For numpy arrays, the + operator adds values together, assuming they are of the same shape:)
a = np.array(range(10))
print a + a # => [ 0 2 4 6 8 10 12 14 16 18]
Upvotes: 1