Fabio Lamanna
Fabio Lamanna

Reputation: 21552

pandas - linear regression of dataframe columns values

I have a pandas dataframe df like:

A,B,C
1,1,1
0.8,0.6,0.9
0.7,0.5,0.8
0.2,0.4,0.1
0.1,0,0

where the three columns have sorted values [0,1]. I'm trying to plot a linear regression over the three series. So far I was able to use scipy.stats as following:

from scipy import stats

xi = np.arange(len(df))

slope, intercept, r_value, p_value, std_err = stats.linregress(xi,df['A'])
line1 = intercept + slope*xi
slope, intercept, r_value, p_value, std_err = stats.linregress(xi,df['B'])
line2 = intercept + slope*xi
slope, intercept, r_value, p_value, std_err = stats.linregress(xi,df['C'])
line3 = intercept + slope*xi

plt.plot(line1,'r-')
plt.plot(line2,'b-')
plt.plot(line3,'g-')

plt.plot(xi,df['A'],'ro')
plt.plot(xi,df['B'],'bo')
plt.plot(xi,df['C'],'go')

obtaining the following plot:

enter image description here

Is it possible to obtain a single linear regression that summarize the three single linear regressions within scipy.stats?

Upvotes: 2

Views: 5626

Answers (1)

Primer
Primer

Reputation: 10302

Perhaps something like this:

x = pd.np.tile(xi, 3)
y = pd.np.r_[df['A'], df['B'], df['C']]

slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
line4 = intercept + slope * xi

plt.plot(line4,'k-')

Upvotes: 2

Related Questions