Makaroniiii
Makaroniiii

Reputation: 348

Very low score with scikit-learn linear regression for obvious pattern

from random import randint,choice
from sklearn.cross_validation import train_test_split
import numpy as np
from sklearn.linear_model import LinearRegression as LR

x1 = []
for i in range(1000):
    if i%2 == 0:
        x1.append(1001)
    else:
        x1.append(999)

leng = [x for x in range(len(x1))]

a = np.array(leng).reshape(len(leng),1)
b = np.array(x1).reshape(len(leng),1)

t1,t2,y1,y2 = train_test_split(a,b)

l = LR()
l.fit(t1,y1)
print(l.score(t2,y2))
print(l.predict(t2))

Dependent values are only 1001 or 999 on linear independent axis. Linear regression should score this with 1.0; however, my score is below 0. Any ideas why? I guess I must be doing something wrong.

Upvotes: 0

Views: 1326

Answers (1)

Brad Solomon
Brad Solomon

Reputation: 40918

Because there is no linear relationship or obvious pattern between a and b. The .score property gives an R-squared and that R-squared is 0 as it should be. Manually,

predictions = model.predict(t2)
rss = np.sum(np.square(predictions - y2.mean()))    
sst = np.sum(np.square(b - b.mean()))

rsquared = rss / sst; rsquared
Out[31]: 0.0040910187945010796

Upvotes: 2

Related Questions