Reputation: 2294
I have this data:
puf = pd.DataFrame({'id':[1,2,3,4,5,6,7,8],
'val':[850,1889,3289,6083,10349,17860,28180,41236]})
The data seems to follow an exponential curve. Let's see the plot:
puf.plot('id','val')
I want to fit an exponential curve ($$ y = Ae^{Bx} $$
, A times e to the B*X)and add it as a column in Pandas. Firstly I tried to log the values:
puf['log_val'] = np.log(puf['val'])
And then to use Numpy to fit the equation:
puf['fit'] = np.polyfit(puf['id'],puf['log_val'],1)
But I get an error:
ValueError: Length of values (2) does not match length of index (8)
My expected result is the fitted values as a new column in Pandas. I attach an image with the column fitted values I want (in orange):
I'm stuck in this code. I'm not sure what I am doing wrong. How can I create a new column with my fitted values?
Upvotes: 3
Views: 1426
Reputation: 79188
Note that you asked for an exponential model yet you have the results for log-linear model.
Check out the work below:
For log-linear, we are fitting E(log(Y))
ie log(y) - (log(b[0]) +b[1]*x)
:
from scipy.optimize import least_squares
least_squares(lambda b: np.log(puf['val']) -(np.log(b[0]) + b[1] * puf['id']),
[1,1])['x']
array([5.99531305e+02, 5.51106793e-01])
These are the values that excel gives.
On the other hand to fit an exponential curve, the randomness is on Y and not on its logarithm, E(Y)=b[0]*exp(b[1] *x)
Hence we have:
least_squares(lambda b: puf['val'] - b[0]*exp(b[1] * puf['id']), [0,1])['x']
array([1.08047304e+03, 4.58116127e-01]) # correct results for exponential fit
Depending on your model choice, the values are alittle different.
Better Model? Since you have same number of parameters, consider the one that gives you lower deviance or better out of sample prediction
Note that the ideal exponential model is E(Y) = A'B'^X
which for comparison can be written as log(E(Y)) = A + XB
while log-linear model will be E(log(Y) = A + XB
. Note the difference in Expectation.
From the two models we have:
Notice how when we go to higher numbers the log-linear overestimates. While in the lower numbers the exponential overestimates.
Code for image:
from scipy.optimize import least_squares
log_lin = least_squares(lambda b: np.log(puf['val']) -(np.log(b[0]) + b[1] * puf['id']),
[1,1])['x']
expo = least_squares(lambda b: puf['val'] - b[0]*exp(b[1] * puf['id']), [0,1])['x']
exp_fun = lambda x: expo[0] * exp(expo[1]*x)
log_lin_fun = lambda x:log_lin[0] * exp(log_lin[1]*x)
plt.plot(puf.id, puf.val, label = 'original')
plt.plot(puf.id, exp_fun(puf.id), label='exponential')
plt.plot(puf.id, log_lin_fun(puf.id), label='log-linear')
plt.legend()
Upvotes: 2
Reputation: 14094
Your getting that error because np.polyfit(puf['id'],puf['log_val'],1)
returns two values array([0.55110679, 6.39614819])
which isn't the shape of your dataframe.
This is what you want
y = a* exp (b*x) -> ln(y)=ln(a)+bx
f = np.polyfit(df['id'], np.log(df['val']), 1)
where
a = np.exp(f[1]) -> 599.5313046712091
b = f[0] -> 0.5511067934637022
Giving
puf['fit'] = a * np.exp(b * puf['id'])
id val fit
0 1 850 1040.290193
1 2 1889 1805.082864
2 3 3289 3132.130026
3 4 6083 5434.785677
4 5 10349 9430.290286
5 6 17860 16363.179739
6 7 28180 28392.938399
7 8 41236 49266.644002
Upvotes: 1