James Drain
James Drain

Reputation: 21

In linear regression in sklearn in Python, inexplicable dimension mismatch error

In sklearn, I want to to train a linear model from with one-dimensional input. But when I feed a [100 x 1] input vector and a [100 x 1] output vector into the linear_model.LinearRegression()'s fit function, I get the error "ValueError: Found arrays with inconsistent numbers of samples: [ 1 100]". It works fine with [7791 x 39] dimensional training input and [7791 x 1] training output.

starting regression training
(7791, 39)
(7791,)
done with regression training; starting probabilities converter training
(100,)
(100,)
Traceback (most recent call last):
  File "makePickles.py", line 19, in <module>
    train_probabilities_converter(scoresToProbabilities[:,1], scoresToProbabilities[:,2])
  File "trainProbabilitiesConverter.py", line 18, in train_probabilities_converter
    regr.fit(rawScores, empiricalProbability)
  File "//anaconda/lib/python2.7/site-packages/sklearn/linear_model/base.py", line 376, in fit
    y_numeric=True, multi_output=True)
  File "//anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 454, in check_X_y
    check_consistent_length(X, y)
  File "//anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 174, in check_consistent_length
    "%s" % str(uniques))
ValueError: Found arrays with inconsistent numbers of samples: [  1 100]

Upvotes: 2

Views: 526

Answers (1)

P. Camilleri
P. Camilleri

Reputation: 13218

Have you tried making your input data (100, 1) instead of (100,)? I know it is sometimes a problem with sklearn (because it could be 100 observations in dimension 1, or 1 observation in dimension 100).

You can do X_test = X_test[:, None] to add a new axis. np.newaxis also works and is a longer, but more explicit name. By the way, it is just an alias for None (they refer to the same object):

>>> import numpy as np
>>> np.newaxis is None
True

Upvotes: 2

Related Questions