user3140106
user3140106

Reputation: 377

Joining two lists of lists, of different sizes in Python

I'm trying to write code for 10-fold cross validation.

That is, dividing the data into 10 equally sized chunks.Then, for each of 10 iterations, take out the ith chunk, and use the remaining 90% for test data.

For the second iteration, I need to join the first 10% with the last 80%. For the third iteration, I join the first 20% with the last 70%. Etc.

(So the first iteration, the first 10% is removed, the second iteration, the 2nd 10% is removed, etc.)

My data consists of 1000 items, each of which is an array of 70 values of type np.float

This loop will be called for each of the 10 validations, with i=0, i=1....i=9:

def get_training(input_array, i):
    training = (input_array[:i*subset_size] + input_array[(i+1)*subset_size:])
    return training

It worked earlier, but now I'm getting the error:

operands could not be broadcast together with shapes (100,70) (800,70)

I think this may be due to the np.float datatype; it was working earlier with another data type.

Thanks

Upvotes: 0

Views: 788

Answers (2)

zhangxaochen
zhangxaochen

Reputation: 34027

Don't reinvent the wheel. You can use function KFold and StratifiedKFold in the sklearn.cross_validation module.

See the docs:

K-Folds cross validation iterator.

Provides train/test indices to split data in train test sets. Split dataset into k consecutive folds (without shuffling).

Each fold is then used a validation set once while the k - 1 remaining fold form the training set.

Upvotes: 2

lynn
lynn

Reputation: 10794

Try:

training = np.concatenate((input_array[:i*subset_size], input_array[(i+1)*subset_size:]))

(For numpy arrays, the + operator adds values together, assuming they are of the same shape:)

a = np.array(range(10))
print a + a                 # => [ 0  2  4  6  8 10 12 14 16 18]

Upvotes: 1

Related Questions