Reputation: 344
Hi ~ I want to ask a quick question related to regression analysis in python. I have the following dataframe
:
group Y X
1 9 3
1 5 4
1 3 1
2 1 6
2 2 4
2 3 9
Y is dependent and X is independent variable. I want to run regression Y=a + bx
by group and output another dataframe
that contains the coefficients, t-stats and R-square. So, the dataframe should be like:
group coefficient t-stats intercept r-square
1 0.25 1.4 4.3 0.43
2 0.30 2.4 3.6 0.49
... ... ... ... ...
Can someone help ? Many thanks in advance for your help.
Upvotes: 1
Views: 1599
Reputation: 2960
I will show some mockup so you can build the rest. It is mainly pulling up a your custom regression function and passing the dataframe in using apply
.
let me know what you think.
import pandas as pd
import statsmodels.api as sm
def GroupRegress(data, yvar, xvars):
Y = data[yvar]
X = data[xvars]
X['intercept'] = 1.
result = sm.OLS(Y, X).fit()
return result.params
import pandas as pd
df = pd.DataFrame({'group': [1,1,1,2,2,2],
'Y': [9,5,3,1,2,3],
'X': [3,4,1,6,4,9]
})
df
df.groupby('group').apply(GroupRegress, 'Y', ['X'])
Result below:
X intercept
group
1 1.000000 3.0
2 0.236842 0.5
Upvotes: 1