Reputation: 31

Is there a way to plot a 'tight' best fit line through scattered X, Y coordinates?

I'm having trouble trying to fit an average curved line through my data in order to find the length. I have a lot of X, Y points in a large pandas dataframe that looks something like:

x = np.asarray([731501.13, 731430.24, 731360.29, 731289.36, 731909.72, 731827.89,
   731742.  , 731657.74, 731577.95, 731502.64, 731430.39, 731359.12,
   731287.3 , 731214.21, 732015.59, 731966.88, 731902.67, 731826.31,
   731743.79, 731660.94, 731581.29, 731505.4 , 731431.95, 732048.71,
   732026.66, 731995.46, 731952.18, 731894.29, 731823.58, 731745.16,
   732149.61, 732091.53, 732052.98, 732026.82, 732005.17, 731977.63,
   732691.84, 732596.62, 732499.45, 732401.62, 732306.18, 732218.35,
   732141.82, 732080.91, 732038.21, 732009.08, 733023.08, 732951.99,
   732873.32, 732787.51])

y = np.asarray([7873771.69, 7873705.34, 7873638.03, 7873571.73, 7874082.33,
   7874027.2 , 7873976.22, 7873923.58, 7873866.35, 7873804.53,
   7873739.58, 7873673.62, 7873608.23, 7873544.15, 7874286.21,
   7874197.15, 7874123.96, 7874063.21, 7874008.78, 7873954.69,
   7873897.31, 7873836.09, 7873772.38, 7874564.62, 7874448.23,
   7874341.23, 7874246.59, 7874166.93, 7874100.4 , 7874041.77,
   7874912.56, 7874833.09, 7874733.62, 7874621.43, 7874504.65,
   7874393.89, 7875225.26, 7875183.85, 7875144.42, 7875105.69,
   7875064.49, 7875015.5 , 7874954.94, 7874878.36, 7874783.13,
   7874674.  , 7875476.18, 7875410.05, 7875351.67, 7875300.61])

The x and y are map view coordinates and I want to calculate the length. I can code the Euclidean distance but because the points are scattered and aren't one point after another, I'm having trouble trying to fit a moving line through this. I've tried polyfit but this mainly produces a straight line even with higher deg, e.g:

from numpy.polynomial.polynomial import polyfit
import numpy as np
import matplotlib.pyplot as plt
z = np.polyfit(x,y,10) 
p = np.poly1d(z)
plt.scatter(x,y, marker='x')
plt.scatter(x, p(x), marker='.')

plt.show()

This is to demonstrate what I mean 1

Any help would be greatly appreciated!

Upvotes: 3

Answers (2)

mikuszefski

Reputation: 4043

This would be an empiric function fitting your data:

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit


x = np.asarray([731501.13, 731430.24, 731360.29, 731289.36, 731909.72, 731827.89,
   731742.  , 731657.74, 731577.95, 731502.64, 731430.39, 731359.12,
   731287.3 , 731214.21, 732015.59, 731966.88, 731902.67, 731826.31,
   731743.79, 731660.94, 731581.29, 731505.4 , 731431.95, 732048.71,
   732026.66, 731995.46, 731952.18, 731894.29, 731823.58, 731745.16,
   732149.61, 732091.53, 732052.98, 732026.82, 732005.17, 731977.63,
   732691.84, 732596.62, 732499.45, 732401.62, 732306.18, 732218.35,
   732141.82, 732080.91, 732038.21, 732009.08, 733023.08, 732951.99,
   732873.32, 732787.51])/732 -1000

y = np.asarray([7873771.69, 7873705.34, 7873638.03, 7873571.73, 7874082.33,
   7874027.2 , 7873976.22, 7873923.58, 7873866.35, 7873804.53,
   7873739.58, 7873673.62, 7873608.23, 7873544.15, 7874286.21,
   7874197.15, 7874123.96, 7874063.21, 7874008.78, 7873954.69,
   7873897.31, 7873836.09, 7873772.38, 7874564.62, 7874448.23,
   7874341.23, 7874246.59, 7874166.93, 7874100.4 , 7874041.77,
   7874912.56, 7874833.09, 7874733.62, 7874621.43, 7874504.65,
   7874393.89, 7875225.26, 7875183.85, 7875144.42, 7875105.69,
   7875064.49, 7875015.5 , 7874954.94, 7874878.36, 7874783.13,
   7874674.  , 7875476.18, 7875410.05, 7875351.67, 7875300.61])/7873 - 1000

def my_func( x, x0, y0, a, b, c, t, s):
    xs = x-x0
    p = a * xs**3 + b * xs**2 + c * xs + y0
    t = t * np.tanh( s * xs )
    return p + t

xth = np.linspace( -1.15, 1.5, 50 )
yth = my_func( xth, 0.03, 0.18, .01, 0, 0.05, .05 , 10)

sol, err = curve_fit( my_func, x, y, p0=[0.03, 0.18, .01, 0, 0.05, .05 , 10] ) 
print sol 
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.scatter( x, y )
ax.plot( xth, yth )
ax.plot( xth, my_func( xth, *sol) )
plt.show()

giving

>>[ 2.86281016e-02  1.95292660e-01  9.62290944e-03 -1.26304655e-02 5.11281073e-02  4.63955967e-02  1.02260568e+01]

and

Upvotes: 2

James Phillips

Reputation: 4657

Here is what I came up with after hammering on this for a few hours. I began by observing that there are approximately two data regions, the lower half and the upper half of the data ranges, with different characteristics in each half. The upper half is flatter with fewer data points, and the lower half has more curvature with small groups of nearly overlapping data points. Below is my attempt to separately model these two regions as a first cut at the problem. I have included a "zoomed" plot showing the disjointed overlap region which makes this code unsatisfactory in its present form. I feel confident that I could beat on this for another day or two and get it into better shape, but this solution might not be what you need.

import numpy
import matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

cutoffVal = 732200.0 # x below or above this value

xData = numpy.asarray([731501.13, 731430.24, 731360.29, 731289.36, 731909.72, 731827.89,
   731742, 731657.74, 731577.95, 731502.64, 731430.39, 731359.12,
   731287.3, 731214.21, 732015.59, 731966.88, 731902.67, 731826.31,
   731743.79, 731660.94, 731581.29, 731505.4, 731431.95, 732048.71,
   732026.66, 731995.46, 731952.18, 731894.29, 731823.58, 731745.16,
   732149.61, 732091.53, 732052.98, 732026.82, 732005.17, 731977.63,
   732691.84, 732596.62, 732499.45, 732401.62, 732306.18, 732218.35,
   732141.82, 732080.91, 732038.21, 732009.08, 733023.08, 732951.99,
   732873.32, 732787.51])


yData = numpy.asarray([7873771.69, 7873705.34, 7873638.03, 7873571.73, 7874082.33,
   7874027.2, 7873976.22, 7873923.58, 7873866.35, 7873804.53,
   7873739.58, 7873673.62, 7873608.23, 7873544.15, 7874286.21,
   7874197.15, 7874123.96, 7874063.21, 7874008.78, 7873954.69,
   7873897.31, 7873836.09, 7873772.38, 7874564.62, 7874448.23,
   7874341.23, 7874246.59, 7874166.93, 7874100.4, 7874041.77,
   7874912.56, 7874833.09, 7874733.62, 7874621.43, 7874504.65,
   7874393.89, 7875225.26, 7875183.85, 7875144.42, 7875105.69,
   7875064.49, 7875015.5, 7874954.94, 7874878.36, 7874783.13,
   7874674. , 7875476.18, 7875410.05, 7875351.67, 7875300.61])


# split off data into above and below cutoff
xAboveList = []
yAboveList = []
xBelowList = []
yBelowList = []
for i in range(len(xData)):
    if xData[i] > cutoffVal:
        xAboveList.append(xData[i])
        yAboveList.append(yData[i])
    else:
        xBelowList.append(xData[i])
        yBelowList.append(yData[i])

xAbove = numpy.array(xAboveList)        
xBelow = numpy.array(xBelowList)        
yAbove = numpy.array(yAboveList)        
yBelow = numpy.array(yBelowList)        

# to fit for data above the cutoff value use a quadratic logarithmic equation
def aboveFunc(x, a, b, c):
    return a + b*numpy.log(x) + c*numpy.power(numpy.log(x), 2.0)

# to fit for data below the cutoff value use a hyperbolic type with offset
def belowFunc(x, a, b, c):
    val = x - cutoffVal
    return val / (a + (b * val) - ((a + b) * val * val)) + c

# some initial parameter values
initialParameters_above = numpy.array([1.0, 1.0, 1.0])
initialParameters_below = numpy.array([-4.29E-04, 4.31E-04,  7.87E+06])

# curve fit the equations individually to their respective data
aboveParameters, pcov = curve_fit(aboveFunc, xAbove, yAbove, initialParameters_above)
belowParameters, pcov = curve_fit(belowFunc, xBelow, yBelow, initialParameters_below)

# for plotting the fitting results
xModelAbove = numpy.linspace(max(xBelow), max(xAbove))
xModelBelow = numpy.linspace(min(xBelow), max(xBelow))
y_fitAbove = aboveFunc(xModelAbove, *aboveParameters)
y_fitBelow = belowFunc(xModelBelow, *belowParameters)

plt.plot(xData, yData, 'D') # plot the raw data as a scatterplot
plt.plot(xModelAbove, y_fitAbove) # plot the above equation using the fitted parameters
plt.plot(xModelBelow, y_fitBelow) # plot the below equation using the fitted parameters
plt.show()

print('Above parameters:', aboveParameters)
print('Below parameters:', belowParameters)

Upvotes: 1

Is there a way to plot a &#39;tight&#39; best fit line through scattered X, Y coordinates?

Answers (2)

Related Questions

Is there a way to plot a 'tight' best fit line through scattered X, Y coordinates?