Reputation: 41
here is the link to the dataset I used: Dataset
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Lets begin with polynomial regression
df = pd.read_excel('enes.xlsx', index='hacim')
X=pd.DataFrame(df['hacim'])
Y=pd.DataFrame(df['delay'])
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X)
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, Y)
plt.scatter(X, Y, color = 'red')
plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), color = 'blue')
plt.title('X Vs Y')
plt.xlabel('hacim')
plt.ylabel('delay')
plt.show()
Last plt.show shows a graph where there are many lines instead of a 1 lined polynomial regression i desired. what wrong and how can ı fix this?
,hacim,delay
0,815,1.44
1,750,1.11
2,321,2.37
3,1021,1.44
4,255,1.09
5,564,1.61
6,1455,15.27
7,525,2.7
8,1118,106.98
9,1036,3.47
10,396,1.34
11,1485,21.49
12,1017,12.22
13,1345,2.72
14,312,1.71
15,742,33.79
16,1100,39.62
17,1445,4.88
18,847,1.55
19,991,1.82
20,1296,10.77
21,854,1.81
22,1198,61.9
23,1162,8.22
24,1463,42.25
25,1272,4.31
26,745,2.36
27,521,2.14
28,1247,94.33
29,732,12.55
30,489,1.05
31,1494,12.78
32,591,3.18
33,257,1.18
34,602,4.24
35,335,2.06
36,523,3.63
37,752,7.61
38,349,1.76
39,771,0.79
40,855,39.08
41,948,3.95
42,1378,97.28
43,598,2.69
44,558,1.67
45,634,34.69
46,1146,12.22
47,1087,1.74
48,628,1.03
49,711,3.34
50,1116,7.27
51,748,1.09
52,1212,14.16
53,434,1.42
54,1046,8.25
55,568,1.33
56,894,2.61
57,1041,4.79
58,801,1.84
59,1387,11.5
60,1171,161.21
61,734,2.43
62,1471,17.42
63,461,1.42
64,751,2.36
65,898,2.4
66,593,1.74
67,942,3.39
68,825,1.09
69,715,20.23
70,725,5.43
71,1128,7.57
72,1348,4.49
73,1393,9.77
74,1379,97.76
75,859,2.59
76,612,15.98
77,1495,8.22
78,887,1.85
79,867,38.65
80,1353,1.6
81,851,60.25
82,1079,24.05
83,1100,25.58
84,638,1.23
85,1115,1.94
86,1443,4.79
87,1421,10.33
88,1279,7.29
89,1176,173.44
90,315,1.53
91,1019,34.03
92,1337,48.67
93,576,28.83
94,919,2.88
95,361,1.5
96,989,1.47
97,1286,32.11
Upvotes: 1
Views: 85
Reputation: 153500
Let's use pandas plot it is much easier:
X=pd.DataFrame(df['hacim'])
Y=pd.DataFrame(df['delay'])
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X)
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, Y)
df['y_pred'] = lin_reg_2.predict(poly_reg.fit_transform(X))
df = df.sort_values('hacim')
ax = df.plot.scatter('hacim','delay')
df.plot('hacim', 'y_pred', ax=ax, color='r')
plt.title('X Vs Y')
plt.xlabel('hacim')
plt.ylabel('delay')
plt.show()
Output:
The root of the scatter lines was unsorted data when plotting line graph.
You could do this:
plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), color = 'blue', marker='o', linestyle='none')
Output:
Upvotes: 1