Reputation: 348
from random import randint,choice
from sklearn.cross_validation import train_test_split
import numpy as np
from sklearn.linear_model import LinearRegression as LR
x1 = []
for i in range(1000):
if i%2 == 0:
x1.append(1001)
else:
x1.append(999)
leng = [x for x in range(len(x1))]
a = np.array(leng).reshape(len(leng),1)
b = np.array(x1).reshape(len(leng),1)
t1,t2,y1,y2 = train_test_split(a,b)
l = LR()
l.fit(t1,y1)
print(l.score(t2,y2))
print(l.predict(t2))
Dependent values are only 1001 or 999 on linear independent axis. Linear regression should score this with 1.0; however, my score is below 0. Any ideas why? I guess I must be doing something wrong.
Upvotes: 0
Views: 1326
Reputation: 40918
Because there is no linear relationship or obvious pattern between a
and b
. The .score
property gives an R-squared and that R-squared is 0 as it should be. Manually,
predictions = model.predict(t2)
rss = np.sum(np.square(predictions - y2.mean()))
sst = np.sum(np.square(b - b.mean()))
rsquared = rss / sst; rsquared
Out[31]: 0.0040910187945010796
Upvotes: 2