Joining two lists of lists, of different sizes in Python

Question

I'm trying to write code for 10-fold cross validation.

That is, dividing the data into 10 equally sized chunks.Then, for each of 10 iterations, take out the ith chunk, and use the remaining 90% for test data.

For the second iteration, I need to join the first 10% with the last 80%. For the third iteration, I join the first 20% with the last 70%. Etc.

(So the first iteration, the first 10% is removed, the second iteration, the 2nd 10% is removed, etc.)

My data consists of 1000 items, each of which is an array of 70 values of type np.float

This loop will be called for each of the 10 validations, with i=0, i=1....i=9:

def get_training(input_array, i):
    training = (input_array[:i*subset_size] + input_array[(i+1)*subset_size:])
    return training

It worked earlier, but now I'm getting the error:

operands could not be broadcast together with shapes (100,70) (800,70)

I think this may be due to the np.float datatype; it was working earlier with another data type.

Thanks

zhangxaochen · Accepted Answer

Don't reinvent the wheel. You can use function KFold and StratifiedKFold in the sklearn.cross_validation module.

See the docs:

K-Folds cross validation iterator.

Provides train/test indices to split data in train test sets. Split dataset into k consecutive folds (without shuffling).

Each fold is then used a validation set once while the k - 1 remaining fold form the training set.

Joining two lists of lists, of different sizes in Python

Answers (2)

Related Questions