SpartanDawg
SpartanDawg

Reputation: 52

Sklearn Linear Regression fit input order? Does exogenous variable go first?

The reference page says:

Parameters: 
X : array-like or sparse matrix, shape (n_samples, n_features)
Training data

y : array_like, shape (n_samples, n_targets)
Target values. Will be cast to X’s dtype if necessary

Is X the exogenous variable? I would assume so but with statsmodel OLS the endogenous comes first so I want to confirm because they yield different coefficients.

Upvotes: 1

Views: 266

Answers (1)

Parthasarathy Subburaj
Parthasarathy Subburaj

Reputation: 4264

Yes you are correct, the order in which you feed your exogenous and endogenous variables are reversed in sklearn module (true for other models in sklearn as well) when compared to the statsmodel OLS module.

If X = exogenous variable and Y = endogenous

In sklearn you would do something like this:

clf.fit(X,Y)

whereas, in statsmodel you would do:

clf.fit(Y,X)

Where clf is the model you are trying to build.

Upvotes: 1

Related Questions