Enes Senel
Enes Senel

Reputation: 41

plt.plot draws multiple curves instad of single curve

here is the link to the dataset I used: Dataset

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#Lets begin with polynomial regression
df = pd.read_excel('enes.xlsx', index='hacim')
X=pd.DataFrame(df['hacim'])
Y=pd.DataFrame(df['delay'])

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X)
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, Y)

plt.scatter(X, Y, color = 'red')
plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), color = 'blue')
plt.title('X Vs Y')
plt.xlabel('hacim')
plt.ylabel('delay')
plt.show()

Last plt.show shows a graph where there are many lines instead of a 1 lined polynomial regression i desired. what wrong and how can ı fix this?

Data

,hacim,delay
0,815,1.44
1,750,1.11
2,321,2.37
3,1021,1.44
4,255,1.09
5,564,1.61
6,1455,15.27
7,525,2.7
8,1118,106.98
9,1036,3.47
10,396,1.34
11,1485,21.49
12,1017,12.22
13,1345,2.72
14,312,1.71
15,742,33.79
16,1100,39.62
17,1445,4.88
18,847,1.55
19,991,1.82
20,1296,10.77
21,854,1.81
22,1198,61.9
23,1162,8.22
24,1463,42.25
25,1272,4.31
26,745,2.36
27,521,2.14
28,1247,94.33
29,732,12.55
30,489,1.05
31,1494,12.78
32,591,3.18
33,257,1.18
34,602,4.24
35,335,2.06
36,523,3.63
37,752,7.61
38,349,1.76
39,771,0.79
40,855,39.08
41,948,3.95
42,1378,97.28
43,598,2.69
44,558,1.67
45,634,34.69
46,1146,12.22
47,1087,1.74
48,628,1.03
49,711,3.34
50,1116,7.27
51,748,1.09
52,1212,14.16
53,434,1.42
54,1046,8.25
55,568,1.33
56,894,2.61
57,1041,4.79
58,801,1.84
59,1387,11.5
60,1171,161.21
61,734,2.43
62,1471,17.42
63,461,1.42
64,751,2.36
65,898,2.4
66,593,1.74
67,942,3.39
68,825,1.09
69,715,20.23
70,725,5.43
71,1128,7.57
72,1348,4.49
73,1393,9.77
74,1379,97.76
75,859,2.59
76,612,15.98
77,1495,8.22
78,887,1.85
79,867,38.65
80,1353,1.6
81,851,60.25
82,1079,24.05
83,1100,25.58
84,638,1.23
85,1115,1.94
86,1443,4.79
87,1421,10.33
88,1279,7.29
89,1176,173.44
90,315,1.53
91,1019,34.03
92,1337,48.67
93,576,28.83
94,919,2.88
95,361,1.5
96,989,1.47
97,1286,32.11

Upvotes: 1

Views: 85

Answers (1)

Scott Boston
Scott Boston

Reputation: 153500

Let's use pandas plot it is much easier:

X=pd.DataFrame(df['hacim'])
Y=pd.DataFrame(df['delay'])

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X)
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, Y)

df['y_pred'] = lin_reg_2.predict(poly_reg.fit_transform(X))
df = df.sort_values('hacim')
ax = df.plot.scatter('hacim','delay')
df.plot('hacim', 'y_pred', ax=ax, color='r')
plt.title('X Vs Y')
plt.xlabel('hacim')
plt.ylabel('delay')
plt.show()

Output:

enter image description here

The root of the scatter lines was unsorted data when plotting line graph.

You could do this:

plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), color = 'blue', marker='o', linestyle='none')

Output:

enter image description here

Upvotes: 1

Related Questions