FlyUFalcon
FlyUFalcon

Reputation: 344

Regression by group and display output in python

Hi ~ I want to ask a quick question related to regression analysis in python. I have the following dataframe:

group      Y        X
 1         9        3
 1         5        4
 1         3        1
 2         1        6
 2         2        4
 2         3        9

Y is dependent and X is independent variable. I want to run regression Y=a + bx by group and output another dataframe that contains the coefficients, t-stats and R-square. So, the dataframe should be like:

group   coefficient   t-stats    intercept    r-square
  1        0.25         1.4        4.3         0.43
  2        0.30         2.4        3.6         0.49
 ...        ...         ...        ...         ...

Can someone help ? Many thanks in advance for your help.

Upvotes: 1

Views: 1599

Answers (1)

MEdwin
MEdwin

Reputation: 2960

I will show some mockup so you can build the rest. It is mainly pulling up a your custom regression function and passing the dataframe in using apply.

let me know what you think.

import pandas as pd
import statsmodels.api as sm 

def GroupRegress(data, yvar, xvars):
    Y = data[yvar]
    X = data[xvars]
    X['intercept'] = 1.
    result = sm.OLS(Y, X).fit()
    return result.params

import pandas as pd
df = pd.DataFrame({'group': [1,1,1,2,2,2], 
                   'Y': [9,5,3,1,2,3],
                  'X': [3,4,1,6,4,9]
                  })
df


df.groupby('group').apply(GroupRegress, 'Y', ['X'])

Result below:

X   intercept
group       
1   1.000000    3.0
2   0.236842    0.5

Upvotes: 1

Related Questions