spicy burrito
spicy burrito

Reputation: 197

python scikit learn logistic regression error

I am trying to plot a logistic regression graph from the following data

X = np.array([0,1,2,3,4,5,6,7,8,9,10,11])
y = np.array([0,0,0,0,1,0,1,0,1,1,1,1])

However when I try:

import numpy as np
import matplotlib.pyplot as plt

from sklearn import linear_model

X = np.array([0,1,2,3,4,5,6,7,8,9,10,11])
y = np.array([0,0,0,0,1,0,1,0,1,1,1,1])

clf = linear_model.LogisticRegression(C=1e5)
clf.fit(X, y)

I get the following error:

ValueError: Found input variables with inconsistent numbers of samples: [1, 12]

I am a bit confused why it thinks that either X or y has only one sample.

Upvotes: 1

Views: 4158

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210812

Modern versions of sklearn expect 2D array as X, so try to reshape it as it was suggested in the error message:

In [7]: clf.fit(X.reshape(-1,1), y)
Out[7]:
LogisticRegression(C=100000.0, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)

BTW, sklearn 0.19.1 gives me a clear error message:

In [10]: sklearn.__version__
Out[10]: '0.19.1'

In [11]: clf.fit(X, y)
...
skipped
...
ValueError: Expected 2D array, got 1D array instead:
array=[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

UPDATE: full code:

In [41]: %paste
import numpy as np
import matplotlib.pyplot as plt

from sklearn import linear_model
import sklearn

X = np.array([0,1,2,3,4,5,6,7,8,9,10,11])
y = np.array([0,0,0,0,1,0,1,0,1,1,1,1])

print('SkLearn version: {}'.format(sklearn.__version__))

clf = linear_model.LogisticRegression(C=1e5)
clf.fit(X.reshape(-1,1), y)

## -- End pasted text --
SkLearn version: 0.19.1
Out[41]:
LogisticRegression(C=100000.0, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)

Upvotes: 2

Related Questions