Rayane Bouslimi
Rayane Bouslimi

Reputation: 193

issues with machine learning scikit learn in python

I'm trying to reproduce a tutorial seen here.

Everything work perfectly until I add the .fit methods with my training set.

Here is a sample of my code :

# TRAINING PART

train_dir = 'pdf/learning_set'
dictionary = make_dic(train_dir)

train_labels = np.zeros(20)
train_labels[17:20] = 1
train_matrix = extract_features(train_dir)
model1 = MultinomialNB()
model1.fit(train_matrix, train_labels)


# TESTING PART

test_dir = 'pdf/testing_set'
test_matrix = extract_features(test_dir)
test_labels = np.zeros(8)
test_labels[4:7] = 1
result1 = model1.predict(test_matrix)
print(confusion_matrix(test_labels, result1))

Here is my Traceback:

Traceback (most recent call last):
File "ML.py", line 65, in <module>
model1.fit(train_matrix, train_labels)
File "/usr/local/lib/python3.6/site-packages/sklearn/naive_bayes.py", 
line 579, in fit
X, y = check_X_y(X, y, 'csr')
File "/usr/local/lib/python3.6/site-
packages/sklearn/utils/validation.py", line 552, in check_X_y
check_consistent_length(X, y)
File "/usr/local/lib/python3.6/site-
packages/sklearn/utils/validation.py", line 173, in 
check_consistent_length
" samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of 
samples: [23, 20]

I would like to know how can I solve this issue ? I'm working on Ubuntu 16.04, with python 3.6.

Upvotes: 0

Views: 73

Answers (1)

Florian H
Florian H

Reputation: 3082

ValueError: Found input variables with inconsistent numbers of samples: [23, 20]

That means you have 23 training Vectors (train_matrix has 23 rows) but only 20 training labels (train_labels is an array of 20 values)

change train_labels = np.zeros(20) to train_labels = np.zeros(23) and it should work.

Upvotes: 1

Related Questions