Ed Rahn
Ed Rahn

Reputation: 51

Ordinary least squares regression giving wrong prediction

I am using statsmodels OLS to fit a series of points to a line:

import statsmodels.api as sm
Y = [1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15]
X = [[73.759999999999991], [73.844999999999999], [73.560000000000002], 
    [73.209999999999994], [72.944999999999993], [73.430000000000007], 
    [72.950000000000003], [73.219999999999999], [72.609999999999999], 
    [74.840000000000003], [73.079999999999998], [74.125], [74.75],
    [74.760000000000005]]

ols = sm.OLS(Y, X)
r = ols.fit()
preds = r.predict()
print preds

And I get the following results:

[ 7.88819844  7.89728869  7.86680961  7.82937917  7.80103898  7.85290687
  7.8015737   7.83044861  7.76521269  8.00369809  7.81547643  7.92723304
  7.99407312  7.99514256]

These are an about 10 times off. What am I doing wrong? I tried adding a constant, that just makes the values 1000 times bigger. I don't know much about statistics, so maybe there is something I need to do with the data?

Upvotes: 4

Views: 1346

Answers (1)

Nate
Nate

Reputation: 1948

I think you have switched your response and your predictor, like Michael Mayer suggested in his comment. If you plot the data with predictions from your model, you get something like this:

import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt

Y = np.array([1,2,3,4,5,6,7,8,9,11,12,13,14,15])
X = np.array([ 73.76 ,  73.845,  73.56 ,  73.21 ,  72.945,  73.43 ,  72.95 ,
    73.22 ,  72.61 ,  74.84 ,  73.08 ,  74.125,  74.75 ,  74.76 ])
Design = np.column_stack((np.ones(14), X))
ols = sm.OLS(Y, Design).fit()
preds = ols.predict()

plt.plot(X, Y, 'ko')
plt.plot(X, preds, 'k-')
plt.show()

enter image description here

If you switch X and Y, which is what I think you want, you get:

Design2 = np.column_stack((np.ones(14), Y))
ols2 = sm.OLS(X, Design2).fit()
preds2 = ols2.predict()
print preds2
[ 73.1386399   73.21305699  73.28747409  73.36189119  73.43630829
  73.51072539  73.58514249  73.65955959  73.73397668  73.88281088
  73.95722798  74.03164508  74.10606218  74.18047927]

plt.plot(Y, X, 'ko')
plt.plot(Y, preds2, 'k-')
plt.show()

enter image description here

Upvotes: 5

Related Questions