Reputation: 5122
I am trying to perform a linear regression on following data.
X = [[ 1 26]
[ 2 26]
[ 3 26]
[ 4 26]
[ 5 26]
[ 6 26]
[ 7 26]
[ 8 26]
[ 9 26]
[10 26]
[11 26]
[12 26]
[13 26]
[14 26]
[15 26]
[16 26]
[17 26]
[18 26]
[19 26]
[20 26]
[21 26]
[22 26]
[23 26]
[24 26]
[25 26]
[26 26]
[27 26]
[28 26]
[29 26]
[30 26]
[31 26]
[32 26]
[33 26]
[34 26]
[35 26]
[36 26]
[37 26]
[38 26]
[39 26]
[40 26]
[41 26]
[42 26]
[43 26]
[44 26]
[45 26]
[46 26]
[47 26]
[48 26]
[49 26]
[50 26]
[51 26]
[52 26]
[53 26]
[54 26]
[55 26]
[56 26]
[57 26]
[58 26]
[59 26]
[60 26]
[61 26]
[62 26]
[63 26]
[64 26]
[65 26]
[66 26]
[67 26]
[68 26]
[69 26]]
Y = [ 192770 14817993 1393537 437541 514014 412468 509393 172715
329806 425876 404031 524371 362817 692020 585431 446286
744061 458805 330027 495654 459060 734793 701697 663319
750496 525311 1045502 250641 500360 507594 456444 478666
431382 495689 458200 349161 538770 355879 535924 549858
611428 517146 239513 354071 342354 698360 467248 500903
625170 404462 1057368 564703 700988 1352634 727453 782708
1023673 1046348 1175588 698072 605187 684739 884551 1067267
728643 790098 580151 340890 299185]
I am trying to plot the result to see the regression line using
regr = linear_model.LinearRegression()
regr.fit(X, Y)
plt.scatter(X[:,0], Y, color='black')
plt.plot(X[:,0], regr.predict(X), color='blue',
linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()
('Coefficients: \n', array([-34296.90306122, 0. ])) Residual sum of squares: 1414631501323.43 Variance score: -17.94
I am trying to predict
pred = regr.predict([[49, 26]])
print pred
something which is already there in the training data and the result is [-19155.16326531]
whose actual value is 625170
What am i doing wrong ?
Please not the value of 26 is coming from a larger array, I have sliced that dat to a small portion so as to train and predict on 26, similarly the X[:,0] might not be continuous value its again coming from a larger array. By array I mean numpy array
Upvotes: 2
Views: 8127
Reputation: 73
If we want to predict the single value (float) to predict on the code, that may not work. I tried in the beginning as below code, but it didn't work:
lin_reg.predict(6.5)
The solution that was found was:
lin_reg.predict([[6.5]])
Try it out if that works for you too.
Upvotes: 0
Reputation: 437
You are probably messing with the input arrays before the plot. Given by the information in your question, the regression indeed returns a result close to your expected answer of 625170.
from sklearn import linear_model
# your input arrays
x = [[a, 26] for a in range(1, 70, 1)]
y = [192770, 14817993,1393537, 437541, 514014, 412468, 509393, 172715, 329806, 425876, 404031, 524371, 362817, 692020, 585431, 446286, 744061, 458805, 330027, 495654, 459060, 734793, 701697, 663319, 750496, 525311,1045502, 250641, 500360, 507594, 456444, 478666, 431382, 495689, 458200, 349161, 538770, 355879, 535924, 549858, 611428, 517146, 239513, 354071, 342354, 698360, 467248, 500903, 625170, 404462,1057368, 564703, 700988,1352634, 727453, 782708, 1023673,1046348,1175588, 698072, 605187, 684739, 884551,1067267, 728643, 790098, 580151, 340890, 299185]
# your code for regression
regr = linear_model.LinearRegression()
regr.fit(x, y)
# the correct coef is different from your findings
print regr.coef_
This returns a result: array([-13139.72031421, 0. ])
When trying prediction: regr.predict([[49, 26]])
returns array([ 611830.33589088])
, which is close to the answer you expected.
Upvotes: 1
Reputation: 4224
As SAMO said in his comment, it's not clear what your data structures are. Assuming you have two features in X and a target Y, if you convert X and Y to numpy arrays your code works as expected.
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt
x1 = range(1, 70)
x2 = [26]*69
X = np.column_stack([x1, x2])
y = ''' 192770 14817993 1393537 437541 514014 412468 509393 172715
329806 425876 404031 524371 362817 692020 585431 446286
744061 458805 330027 495654 459060 734793 701697 663319
750496 525311 1045502 250641 500360 507594 456444 478666
431382 495689 458200 349161 538770 355879 535924 549858
611428 517146 239513 354071 342354 698360 467248 500903
625170 404462 1057368 564703 700988 1352634 727453 782708
1023673 1046348 1175588 698072 605187 684739 884551 1067267
728643 790098 580151 340890 299185'''
Y = np.array(map(int, y.split()))
regr = linear_model.LinearRegression()
regr.fit(X, Y)
plt.scatter(X[:,0], Y, color='black')
plt.plot(X[:,0], regr.predict(X), color='blue',
linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()
print regr.predict([[49,26]])
# 611830.33589088
Upvotes: 2