How can I find a well-fitting trendline in Matplotlib for this data?

Question

my problem is allegedly simple - I have scatter data in X and Y, and want to get a nice, well-fitting trendline with a known equation so that I can go on to correspond LDR voltages into power readings. However, I'm having trouble with generating a trendline in Matplotlib or Scipy that fits well, which I believe is because there's a logarithmic relationship.

I'm using Spyder and Matplotlib, and first tried plotting the X (Thorlabs) and Y (LDR) data as a log-log scatter plot. Because the data didn't seem to show a linear relationship after doing this, I then used numpy's Polynomial.fit with degree 5 to 6. This looked good, but then when inverting the axes, so I could get something of the form [LDR] = f[Thorlabs], I noticed the fit was suddenly not very good at all at the extremes of my data.

Using this question using curve_fit seems to be the way to go, but I tried using curve_fit as described here and, after adjusting to increase the max number of curve-fit iterations, stumbled when I got the error message "TypeError: can't multiply sequence by non-int of type 'numpy.float64'", which will likely be because my data contains decimal points. I'm not sure how to account for this.

I have several mini-questions, then -

am I misunderstanding the above examples?
is there a better way I could go about trying to find the ideal trendline for this data? Is it possible that it's some sort of logarithmic relationship on top of a log-log plot?
once I get a trendline, how can I make sure it fits well and can be displayed?


#import libraries
import matplotlib.pyplot as plt
import csv
import numpy as np
from numpy.polynomial import Polynomial
import scipy.optimize as opt

#initialise arrays - I create log arrays too so I can plot directly
deg = 6    #degree of polynomial fitting for Polynomial.fit()
thorlabs = []
logthorlabs = []

ldr = []
logldr = []

#read in LDR/Thorlabs datasets from file
with open('16ldr561nm.txt','r') as csvfile:
    plots = csv.reader(csvfile, delimiter='	')
    for row in plots:
        thorlabs.append(float(row[0]))
        ldr.append(float(row[1]))
        logthorlabs.append(np.log(float(row[0])))
        logldr.append(np.log(float(row[1])))
#This seems to work just fine, I now have arrays containing data in float

#fit and plot log polynomials
p = Polynomial.fit(logthorlabs, logldr, deg)
plt.plot(*p.linspace()) #plot lines


#plot scatter graphs on log-log axis - either using log arrays or on loglog plot
#plt.loglog()
plt.scatter(logthorlabs, logldr, label='16bit ADC LDR1')
plt.xlabel('log Thorlabs laser power (microW)')
plt.ylabel('log LDR voltage (mV)')
plt.title('LDR voltage against laser power at 561nm')
plt.legend()
plt.show()


#attempt at using curve_fit - when using, comment out the above block
"""
# This is the function we are trying to fit to the data.
def func(x, a, b, c):
     return a * np.exp(-b * x) + c
#freaks out here as I get a type error which I am not sure how to account for


# Plot the actual data
plt.plot(thorlabs, ldr, ".", label="Data");

#Adjusted maxfev to 5000. I know you can make "guesses" here but I am not sure how to do so
# The actual curve fitting happens here
optimizedParameters, pcov = opt.curve_fit(func, thorlabs, ldr, maxfev=5000);

# Use the optimized parameters to plot the best fit
plt.plot(thorlabs, func(ldr, *optimizedParameters), label="fit");

# Show the graph
plt.legend();
plt.show();
"""

When using curve_fit, I get a "TypeError: can't multiply sequence by non-int of type 'numpy.float64'".

As I don't have enough reputation to post images, my raw dataset can be found here. (Otherwise I'd include the graphs!)

(Note that I actually have two datasets, but as I only want to know the principle for calculating a trendline for one, I've left out the other dataset above.)

How can I find a well-fitting trendline in Matplotlib for this data?

Answers (1)

Related Questions