Reputation: 135
Supposed I have the following data frame:
df = pd.DataFrame({'water': np.repeat(['daily', 'weekly'], 15),
'sun': np.tile(np.repeat(['low', 'med', 'high'], 5), 2),
'height': [6, 6, 6, 5, 6, 5, 5, 6, 4, 5,
6, 6, 7, 8, 7, 3, 4, 4, 4, 5,
4, 4, 4, 4, 4, 5, 6, 6, 7, 8],
'phosphorus': [8, 9, 3, 5, 6, 5, 7, 6, 4, 5,
6, 6, 7, 8, 8, 3, 4, 4, 4, 15,
4, 6, 4, 15, 4, 5, 6, 6, 17, 8]})
I want to perform an ANCOVA, with IV = [water,sun]
, DV = height
, covariate = phosphorus
. This is a brief summary of DV by each IV:
water = df.groupby('water').agg({'height': ['count','mean','std','var']}).reset_index()
sun = df.groupby('sun').agg({'height': ['count','mean','std','var']}).reset_index()
# Height by Water
count mean std var
0 daily 15 5.87 0.99 0.98
1 weekly 15 4.80 1.37 1.89
# Height by Sun
count mean std var
0 high 10 6.6 0.97 0.93
1 low 10 4.9 1.10 1.21
2 med 10 4.5 0.71 0.50
Using the OLS model, I perform the following ANCOVA model:
# Fit the ANCOVA Model
model = sm.formula.ols('height ~ C(sun) + C(water) + phosphorus', data=df).fit() # build the model
ancova_table = sm.stats.anova_lm(model, typ=2) # fit it & provide table
alpha = .05
# Print Ancova Table
print(ancova_table)
The ANOVA table indicates that: sun
and water
are both significant predictors, and phosphorus
is also a significant covariate.
sum_sq df F PR(>F)
C(sun) 19.79 2.0 20.49 5.38e-06
C(water) 9.71 1.0 20.10 1.42e-04
phosphorus 3.20 1.0 6.63 1.64e-02
Residual 12.07 25.0 NaN NaN
My question is: How can I perform a post-hoc analysis of this ANCOVA model? Specifically, how can I calculate the mean of height
for water
and sun
after adjusting for phosphorus
as a covariate? How would it different from the mean before adjusting for the covariate?
# Without Adjustment for covariate?
# Height by Water
count mean std var
0 daily 15 5.87 0.99 0.98
1 weekly 15 4.80 1.37 1.89
# Height by Sun
count mean std var
0 high 10 6.6 0.97 0.93
1 low 10 4.9 1.10 1.21
2 med 10 4.5 0.71 0.50
Upvotes: 2
Views: 112