Reputation: 101
I am trying to figure out how to determine the slope trend from best fit lines that have points. Basically, once I have the trend in the slope, I want to plot multiple other lines with that trend in the same plot. For example:
This plot is basically what I want to do, but I am not sure how to do it. As you can see, it has several best fit lines with points that have slopes and intersect at x = 6. After those lines, it has several lines that are based on the trend from the other slopes. I am assuming that using this code I can do something similar, but I am unsure how to manipulate the code to do what I want.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# simulate some artificial data
# =====================================
df = pd.DataFrame( { 'Age' : np.random.rand(25) * 160 } )
df['Length'] = df['Age'] * 0.88 + np.random.rand(25) * 5000
# plot those data points
# ==============================
fig, ax = plt.subplots()
ax.scatter(df['Length'], df['Age'])
# Now add on a line with a fixed slope of 0.03
slope = 0.03
# A line with a fixed slope can intercept the axis
# anywhere so we're going to have it go through 0,0
x_0 = 0
y_0 = 0
# And we'll have the line stop at x = 5000
x_1 = 5000
y_1 = slope (x_1 - x_0) + y_0
# Draw these two points with big triangles to make it clear
# where they lie
ax.scatter([x_0, x_1], [y_0, y_1], marker='^', s=150, c='r')
# And now connect them
ax.plot([x_0, x_1], [y_0, y_1], c='r')
plt.show()
Upvotes: 2
Views: 2637
Reputation: 72
I just modified your code a little bit over here. Basically what you need is a piecewise function. Under a certain value you have different slopes but all end up with 3000, after that the slop is just 0.
The plot is as follows:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# simulate some artificial data
# =====================================
df = pd.DataFrame( { 'Age' : np.random.rand(25) * 160 } )
df['Length'] = df['Age'] * 0.88 + np.random.rand(25) * 5000
# plot those data points
# ==============================
fig, ax = plt.subplots()
ax.scatter(df['Length'], df['Age'])
# Now add on a line with a fixed slope of 0.03
#slope1 = -0.03
slope1 = np.arange(-0.05, 0, 0.01)
slope2 = 0
# A line with a fixed slope can intercept the axis
# anywhere so we're going to have it go through 0,0
x_0 = 0
y_1 = 0
# And we'll have the line stop at x = 5000
for slope in slope1:
x_1 = 3000
y_0 = y_1 - slope * (x_1 - x_0)
ax.plot([x_0, x_1], [y_0, y_1], c='r')
x_2 = 5000
y_2 = slope2 * (x_2 - x_1) + y_1
# Draw these two points with big triangles to make it clear
# where they lie
ax.scatter([x_0, x_1], [y_0, y_1], marker='^', s=150, c='r')
# And now connect them
ax.plot([x_1, x_2], [y_1, y_2], c='r')
plt.show()
Upvotes: 1
Reputation: 25400
The value y_1
can be found by using the equation of a straight line given by your slope
and y_0
:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'Age': np.random.rand(25) * 160})
df['Length'] = df['Age'] * 0.88 + np.random.rand(25) * 5000
fig, ax = plt.subplots()
ax.scatter(df['Length'], df['Age'])
slope = 0.03
x_0 = 0
y_0 = 0
x_1 = 5000
y_1 = (slope * x_1) + y_0 # equation of a straight line: y = mx + c
ax.plot([x_0, x_1], [y_0, y_1], marker='^', markersize=10, c='r')
plt.show()
Which produces the following graph:
In order to plot multiple lines, first create an array/list of gradients that will be used and then follow the same steps:
df = pd.DataFrame({'Age': np.random.rand(25) * 160})
df['Length'] = df['Age'] * 0.88 + np.random.rand(25) * 5000
fig, ax = plt.subplots()
ax.scatter(df['Length'], df['Age'])
slope = 0.03
x_0 = 0
y_0 = 0
x_1 = 5000
slopes = np.linspace(0.01, 0.05, 5) # create an array containing the gradients
new_y = (slopes * x_1) + y_0 # find the corresponding y values at x = 5000
for i in range(len(slopes)):
ax.plot([x_0, x_1], [y_0, new_y[i]], marker='^', markersize=10, label=slopes[i])
plt.legend(title="Gradients")
plt.show()
This produces the following figure:
Upvotes: 4