LookingForSomething
LookingForSomething

Reputation: 134

Found input variables with inconsistent numbers of samples: [100, 300]

I am a beginner in this field and was trying to model the data set as per logistic regression. The code is as follows:

import numpy as np
from matplotlib import pyplot as plt
import pandas as pnd
from sklearn.preprocessing import Imputer, LabelEncoder, OneHotEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix

# Import the dataset
data_set = pnd.read_csv("/Users/Siddharth/PycharmProjects/Deep_Learning/Classification Template/Social_Network_Ads.csv")
X = data_set.iloc[:, [2,3]].values
Y = data_set.iloc[:, 4].values

# Splitting the set into training set and testing set
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.25, random_state=0)

# Scaling the variables
scaler_x = StandardScaler()
x_train = scaler_x.fit_transform(x_train)
x_train = scaler_x.transform(x_test)

# Fitting Linear Regression to training data
classifier = LogisticRegression(random_state=0)
classifier.fit(x_train, y_train)

# Predicting the test set results
y_prediction = classifier.predict(x_test)

# Making the confusion matrix
conMat = confusion_matrix(y_true=y_test, y_pred=y_prediction)
print(conMat)

The error I am getting is in the classifier.fit(x_train, y_train). The error is:

Traceback (most recent call last):
  File "/Users/Siddharth/PycharmProjects/Deep_Learning/Logistic_regression.py", line 24, in <module>
    classifier.fit(x_train, y_train)
  File "/usr/local/lib/python3.6/site-packages/sklearn/linear_model/logistic.py", line 1173, in fit
    order="C")
  File "/usr/local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 531, in check_X_y
    check_consistent_length(X, y)
  File "/usr/local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 181, in check_consistent_length
    " samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [100, 300]

I have no clue why this is happening. Any help will be appreciated. Thank You!!

Upvotes: 1

Views: 8666

Answers (1)

Y. Luo
Y. Luo

Reputation: 5732

Seems like you have a typo here. You might want:

x_test = scaler_x.transform(x_test)

rather than: x_train = scaler_x.transform(x_test). In short, the error basically says sizes of your x_train (which is actually x_test) and y_train doesn't match.

Upvotes: 4

Related Questions