Alvaro Hernandorena
Alvaro Hernandorena

Reputation: 610

Can I make a logarithmic regression on sklearn?

I don't know if "logarithmic regression" is the right term, I need to fit a curve on my data, like a polynomial curve but going flat on the end.

Here is an image, the blue curve is what I have (2nd order polynomial regression) and the magenta curve is what I need.

enter image description here

I have search a lot and can't find that, only linear regression, polynomial regression, but no logarithmic regression on sklearn. I need to plot the curve and then make predictions with that regression.

EDIT

Here is the data for the plot image that I posted:

x,y
670,75
707,46
565,47
342,77
433,73
472,46
569,52
611,60
616,63
493,67
572,11
745,12
483,75
637,75
218,251
444,72
305,75
746,64
444,98
342,117
272,85
128,275
500,75
654,65
241,150
217,150
426,131
155,153
841,66
737,70
722,70
754,60
664,60
688,60
796,55
799,62
229,150
232,95
116,480
340,49
501,65

Upvotes: 3

Views: 23664

Answers (3)

Jack Chi
Jack Chi

Reputation: 106

If I understand correctly, you want to fit the data with a function like y = a * exp(-b * (x - c)) + d.

I am not sure if sklearn can do it. But you can use scipy.optimize.curve_fit() to fit your data with whatever the function you define.(scipy):

For your case, I experimented with your data and here is the result:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

my_data = np.genfromtxt('yourdata.csv', delimiter=',')
my_data = my_data[my_data[:,0].argsort()]
xdata = my_data[:,0].transpose()
ydata = my_data[:,1].transpose()

# define a function for fitting
def func(x, a, b, c, d):
    return a * np.exp(-b * (x - c)) + d

init_vals = [50, 0, 90, 63]
# fit your data and getting fit parameters
popt, pcov = curve_fit(func, xdata, ydata, p0=init_vals, bounds=([0, 0, 90, 0], [1000, 0.1, 200, 200]))
# predict new data based on your fit
y_pred = func(200, *popt)
print(y_pred)

plt.plot(xdata, ydata, 'bo', label='data')
plt.plot(xdata, func(xdata, *popt), '-', label='fit')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

plot from the code above

I found that the initial value for b is critical for fitting. I estimated a small range for it and then fit the data.

If you have no priori knowledge of the relationship between x and y, you can use the regression methods provided by sklearn, like linear regression, Kernel ridge regression (KRR), Nearest Neighbors Regression, Gaussian Process Regression etc. to fit nonlinear data. Find the documentation here

Upvotes: 9

chaooder
chaooder

Reputation: 1506

To use sklearn, you can first remodel your case y = Aexp(-BX) to ln(Y) = ln(A) - BX, and then use LinearRegressor to train and fit your data.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### Read Data
df = pd.read_csv('data.csv')

### Prepare X, Y & ln(Y)
X = df.sort_values(by=['x']).loc[:, 'x':'x']
Y = df.sort_values(by=['x']).loc[:, 'y':'y']
ln_Y = np.log(Y)

### Use the relation ln(Y) = ln(A) - BX to fit X to ln(Y)
from sklearn.linear_model import LinearRegression
exp_reg = LinearRegression()
exp_reg.fit(X, ln_Y)
#### You can introduce weights as well to apply more bias to the smaller X values, 
#### I am transforming X arbitrarily to apply higher arbitrary weights to smaller X values
exp_reg_weighted = LinearRegression()
exp_reg_weighted.fit(X, ln_Y, sample_weight=np.array(1/((X - 100).values**2)).reshape(-1))

### Get predicted values of Y
Y_pred = np.exp(exp_reg.predict(X))
Y_pred_weighted = np.exp(exp_reg_weighted.predict(X))

### Plot
plt.scatter(X, Y)
plt.plot(X, Y_pred, label='Default')
plt.plot(X, Y_pred_weighted, label='Weighted')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()

plt.show()

enter image description here

Upvotes: 4

binjip
binjip

Reputation: 544

You are looking at exponentially distributed data.

You can transform your y-variable by log and then use linear regression. This works because large values of y are compressed more than smaller values.

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import expon

x = np.linspace(1, 10, 10)
y = np.array([30, 20, 12, 8, 7, 4, 3, 2, 2, 1])
y_fit = expon.pdf(x, scale=2)*100

fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(x, y)
ax.plot(x, y_fit)
ax.set_ylabel('y (blue)')
ax.grid(True)

ax2 = ax.twinx()
ax2.scatter(x, np.log(y), color='red')
ax2.set_ylabel('log(y) (red)')

plt.show()

enter image description here

Upvotes: 5

Related Questions