Reputation: 1485
I am trying to fit a simple OLS model using statsmodels by feeding in 2 numpy arrays that have column names. However on trying to fit the model I receive this error:
ValueError: exog is not 1d or 2d
To make the example reproducible I have used the sklearn dataset and created arrays. My code is as such:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn import datasets ## imports datasets from scikit-learn
data = datasets.load_boston() ## loads Boston dataset from datasets library
df = pd.DataFrame(data.data, columns=data.feature_names)
Y = pd.DataFrame(data.target, columns=["MEDV"])
Y = Y.to_numpy(dtype=[('MEDV', 'float64')])
X = df.to_numpy(dtype=[('CRIM', 'float64'), ('ZN', 'float64'), ('INDUS', 'float64'), ('CHAS', 'float64'), ('NOX', 'float64'),
('RM', 'float64'), ('AGE', 'float64'), ('DIS', 'float64'), ('RAD', 'float64'), ('TAX', 'float64'),
('PTRATIO', 'float64'), ('B', 'float64'), ('LSTAT', 'float64')])
model = sm.OLS(Y, X).fit()
This does not make any sense as my Y variable is a vertical vector of numbers so it surely is 1D or 2D.
Does anyone understand why I am receiving this error?
Upvotes: 1
Views: 1626
Reputation: 6495
The simple fix is:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn import datasets ## imports datasets from scikit-learn
data = datasets.load_boston() ## loads Boston dataset from datasets library
df = pd.DataFrame(data.data,
columns=data.feature_names)
Y = pd.DataFrame(data.target, columns=["MEDV"])
X = df.to_numpy()
Y = Y.to_numpy()
model = sm.OLS(Y, X).fit()
Let's see the differences between the two approaches:
Y = pd.DataFrame(data.target, columns=["MEDV"])
(Y.to_numpy(dtype=[('MEDV', 'float64')]))[:10]
array([[(24. ,)],
[(21.6,)],
[(34.7,)],
[(33.4,)],
[(36.2,)],
[(28.7,)],
[(22.9,)],
[(27.1,)],
[(16.5,)],
[(18.9,)]], dtype=[('MEDV', '<f8')])
# That is an array of tuples
Y.to_numpy()[:10]
array([[24. ],
[21.6],
[34.7],
[33.4],
[36.2],
[28.7],
[22.9],
[27.1],
[16.5],
[18.9]])
# This is an array of floats
The exact same happens for X
.
Upvotes: 1