Abhik Sarkar
Abhik Sarkar

Reputation: 965

Alternative to Pandas OLS

I want to make trend-lines in Pandas Series. I liked the way it was done using pandas.ols What is the current best alternative for pandas.ols

Upvotes: 2

Views: 1419

Answers (1)

pjw
pjw

Reputation: 2325

Below is an example using the Linear Regression package from StatsModels

This shows 1st, 2nd, and 3rd order polynomial fits for a randomly-generated dataset (using Ordinary Least Squares).

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf

np.random.seed(654123)

# generate a random dataset with heteroscedasticity
nobs = 1000
x = np.random.uniform(-4, 4, nobs)
y = x + 0.25 * x**2 + 0.1 * np.exp(1 + np.abs(x)) * np.random.randn(nobs)

df = pd.DataFrame({'predictor': x, 'response': y})

x1 = pd.DataFrame({'predictor': np.linspace(df.predictor.min(), df.predictor.max(), nobs)})

poly_1 = smf.ols(formula='response ~ 1 + predictor', data=df).fit()
poly_2 = smf.ols(formula='response ~ 1 + predictor + I(predictor ** 2.0)', data=df).fit()
poly_3 = smf.ols(formula='response ~ 1 + predictor + I(predictor ** 2.0) + I(predictor ** 3.0)', data=df).fit()

plt.figure(figsize=(9 * 1.618, 9))
plt.plot(x1.predictor, poly_1.predict(x1), 'r-', 
         label='1st order poly fit, $R^2$=%.2f' % poly_2.rsquared)
plt.plot(x1.predictor, poly_2.predict(x1), 'b-', 
         label='2nd order poly fit, $R^2$=%.2f' % poly_2.rsquared)
plt.plot(x1.predictor, poly_3.predict(x1), 'g-', 
         label='3rd order poly fit, $R^2$=%.2f' % poly_2.rsquared)

plt.plot(x, y, 'o', alpha=0.2)
plt.legend(loc="upper center", fontsize=14)

enter image description here

scipy.stats.linregress is another good option you could explore.

Upvotes: 2

Related Questions