Reputation: 11
my problem is allegedly simple - I have scatter data in X and Y, and want to get a nice, well-fitting trendline with a known equation so that I can go on to correspond LDR voltages into power readings. However, I'm having trouble with generating a trendline in Matplotlib or Scipy that fits well, which I believe is because there's a logarithmic relationship.
I'm using Spyder and Matplotlib, and first tried plotting the X (Thorlabs) and Y (LDR) data as a log-log scatter plot. Because the data didn't seem to show a linear relationship after doing this, I then used numpy's Polynomial.fit with degree 5 to 6. This looked good, but then when inverting the axes, so I could get something of the form [LDR] = f[Thorlabs], I noticed the fit was suddenly not very good at all at the extremes of my data.
Using this question using curve_fit seems to be the way to go, but I tried using curve_fit as described here and, after adjusting to increase the max number of curve-fit iterations, stumbled when I got the error message "TypeError: can't multiply sequence by non-int of type 'numpy.float64'", which will likely be because my data contains decimal points. I'm not sure how to account for this.
I have several mini-questions, then -
am I misunderstanding the above examples?
is there a better way I could go about trying to find the ideal trendline for this data? Is it possible that it's some sort of logarithmic relationship on top of a log-log plot?
once I get a trendline, how can I make sure it fits well and can be displayed?
#import libraries
import matplotlib.pyplot as plt
import csv
import numpy as np
from numpy.polynomial import Polynomial
import scipy.optimize as opt
#initialise arrays - I create log arrays too so I can plot directly
deg = 6 #degree of polynomial fitting for Polynomial.fit()
thorlabs = []
logthorlabs = []
ldr = []
logldr = []
#read in LDR/Thorlabs datasets from file
with open('16ldr561nm.txt','r') as csvfile:
plots = csv.reader(csvfile, delimiter='\t')
for row in plots:
thorlabs.append(float(row[0]))
ldr.append(float(row[1]))
logthorlabs.append(np.log(float(row[0])))
logldr.append(np.log(float(row[1])))
#This seems to work just fine, I now have arrays containing data in float
#fit and plot log polynomials
p = Polynomial.fit(logthorlabs, logldr, deg)
plt.plot(*p.linspace()) #plot lines
#plot scatter graphs on log-log axis - either using log arrays or on loglog plot
#plt.loglog()
plt.scatter(logthorlabs, logldr, label='16bit ADC LDR1')
plt.xlabel('log Thorlabs laser power (microW)')
plt.ylabel('log LDR voltage (mV)')
plt.title('LDR voltage against laser power at 561nm')
plt.legend()
plt.show()
#attempt at using curve_fit - when using, comment out the above block
"""
# This is the function we are trying to fit to the data.
def func(x, a, b, c):
return a * np.exp(-b * x) + c
#freaks out here as I get a type error which I am not sure how to account for
# Plot the actual data
plt.plot(thorlabs, ldr, ".", label="Data");
#Adjusted maxfev to 5000. I know you can make "guesses" here but I am not sure how to do so
# The actual curve fitting happens here
optimizedParameters, pcov = opt.curve_fit(func, thorlabs, ldr, maxfev=5000);
# Use the optimized parameters to plot the best fit
plt.plot(thorlabs, func(ldr, *optimizedParameters), label="fit");
# Show the graph
plt.legend();
plt.show();
"""
When using curve_fit, I get a "TypeError: can't multiply sequence by non-int of type 'numpy.float64'".
As I don't have enough reputation to post images, my raw dataset can be found here. (Otherwise I'd include the graphs!)
(Note that I actually have two datasets, but as I only want to know the principle for calculating a trendline for one, I've left out the other dataset above.)
Upvotes: 1
Views: 1032
Reputation: 168913
Refactoring your code a bit, most importantly to use native Numpy arrays once things have been parsed out from the file, makes things not crash, but the CurveFit line doesn't look good at all.
The code prints out the parameters fit by curve_fit
, which don't look very good either, and a warning too: "Covariance of the parameters could not be estimated". I'm no mathematician/statistician, so I don't know what to do there.
from numpy.polynomial import Polynomial
import csv
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as opt
def read_dataset(filename):
x = []
y = []
with open(filename, "r") as csvfile:
plots = csv.reader(csvfile, delimiter="\t")
for row in plots:
x.append(float(row[0]))
y.append(float(row[1]))
# cast to native numpy arrays
x = np.array(x)
y = np.array(y)
return (x, y)
ldr, thorlabs = read_dataset("16ldr561nm.txt")
plt.scatter(thorlabs, ldr, label="Data")
plt.xlabel("Thorlabs laser power (microW)")
plt.ylabel("LDR voltage (mV)")
plt.title("LDR voltage against laser power at 561nm")
# Generate and plot polynomial
p = Polynomial.fit(thorlabs, ldr, 6)
plt.plot(*p.linspace(), label="Polynomial")
# Generate and plot curvefit
def func(x, a, b, c):
return a * np.exp(-b * x) + c
optimizedParameters, pcov = opt.curve_fit(func, thorlabs, ldr)
print(optimizedParameters, pcov)
plt.plot(thorlabs, func(ldr, *optimizedParameters), label="CurveFit")
# Show everything
plt.legend()
plt.show()
If you really need to log()
the data, it's easily done with
x = np.log(x)
y = np.log(y)
which will keep the arrays as NumPy arrays and be plenty faster than doing it "by hand".
Upvotes: 2