Reputation: 965
I want to make trend-lines in Pandas Series.
I liked the way it was done using pandas.ols
What is the current best alternative for pandas.ols
Upvotes: 2
Views: 1419
Reputation: 2325
Below is an example using the Linear Regression package from StatsModels
This shows 1st, 2nd, and 3rd order polynomial fits for a randomly-generated dataset (using Ordinary Least Squares).
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
np.random.seed(654123)
# generate a random dataset with heteroscedasticity
nobs = 1000
x = np.random.uniform(-4, 4, nobs)
y = x + 0.25 * x**2 + 0.1 * np.exp(1 + np.abs(x)) * np.random.randn(nobs)
df = pd.DataFrame({'predictor': x, 'response': y})
x1 = pd.DataFrame({'predictor': np.linspace(df.predictor.min(), df.predictor.max(), nobs)})
poly_1 = smf.ols(formula='response ~ 1 + predictor', data=df).fit()
poly_2 = smf.ols(formula='response ~ 1 + predictor + I(predictor ** 2.0)', data=df).fit()
poly_3 = smf.ols(formula='response ~ 1 + predictor + I(predictor ** 2.0) + I(predictor ** 3.0)', data=df).fit()
plt.figure(figsize=(9 * 1.618, 9))
plt.plot(x1.predictor, poly_1.predict(x1), 'r-',
label='1st order poly fit, $R^2$=%.2f' % poly_2.rsquared)
plt.plot(x1.predictor, poly_2.predict(x1), 'b-',
label='2nd order poly fit, $R^2$=%.2f' % poly_2.rsquared)
plt.plot(x1.predictor, poly_3.predict(x1), 'g-',
label='3rd order poly fit, $R^2$=%.2f' % poly_2.rsquared)
plt.plot(x, y, 'o', alpha=0.2)
plt.legend(loc="upper center", fontsize=14)
scipy.stats.linregress is another good option you could explore.
Upvotes: 2