ValueError Inconsistent number of samples error with MultinomialNB

Question

I need to create a model that classifies records accurately based on a variable. For instance, if a record has predictor A or B, I want it to be classified as having predicted value X. The actual data is in this form:

    Predicted    Predictor
      X            A
      X            B
      Y            D
      X            A

For my solution, I did the following: 1. Used LabelEncoder to create numerical values for the Predicted column 2. The predictor variable has multiple categories, which I parsed into individual columns using get_dummies.

Here is a sub-section of the dataframe with the (dummy)Predictor and a couple of predictor categories (pardon the misalignment):

    Predicted Predictor_A    Predictor_B
9056    30  0   0
2482    74  1   0
3407    56  1   0
12882   15  0   0
7988    30  0   0
13032   12  0   0
9738    28  0   0
6739    40  0   0
373 131 0   0
3030    62  0   0
8964    30  0   0
691 125 0   0
6214    41  0   0
6438    41  1   0
5060    42  0   0
3703    49  0   0
12461   16  0   0
2235    75  0   0
5107    42  0   0
4464    46  0   0
7075    39  1   0
11891   16  0   0
9190    30  0   0
8312    30  0   0
10328   24  0   0
1602    97  0   0
8804    30  0   0
8286    30  0   0
6821    40  0   0
3953    46  1

After reshaping the data into the datframe as shown above, I try using MultinomialNB from sklearn. When doing so, the error I run into is:

ValueError: Found input variables with inconsistent numbers of samples: [1, 8158]

I'm running into the error while trying it with a dataframe that has only 2 columns -> Predicted and Predictor_A

My questions are:

What do I need to do resolve the error?
Is my approach correct?

ValueError Inconsistent number of samples error with MultinomialNB

Answers (1)

EDIT

Related Questions