Reputation: 121
Good day!
Is there an effective way to find the p-values of a 4-Way ANVOA model in Python.
Something like this would have worked in R in a for loop for a bunch of simulations
pValues[k] <- anova(lm(Yield ~ Water + Row + Column, data=y))$"Pr(>F)"[1]
I've tried researchpy
and have since moved onto statsmodels
but I have no idea how to proceed from here...
pValues[k] = statsmodels.stats.anova_lm(data=y)."Pr(>F)"[1]
Upvotes: 1
Views: 361
Reputation: 46898
In R:
set.seed(111)
y = data.frame(matrix(rnorm(400),100,4))
colnames(y) = c("Yield","Water","Row","Column")
anova(lm(Yield ~ Water + Row + Column, data=y))
Analysis of Variance Table
Response: Yield
Df Sum Sq Mean Sq F value Pr(>F)
Water 1 0.364 0.36410 0.3122 0.5776
Row 1 0.518 0.51768 0.4440 0.5068
Column 1 0.703 0.70256 0.6025 0.4395
Residuals 96 111.942 1.16606
write.csv(y,"y_data.csv",quote=FALSE,row.names=FALSE)
In python, you can use the function anova_lm
from stats in statsmodels to obtain the table:
import statsmodels.api as sm
from statsmodels.formula.api import ols
import pandas as pd
y = pd.read_csv("y_data.csv")
mod = ols('Yield ~ Water + Row + Column',data=y).fit()
tab = sm.stats.anova_lm(mod)
df sum_sq mean_sq F PR(>F)
Water 1.0 0.364100 0.364100 0.312247 0.577606
Row 1.0 0.517678 0.517678 0.443954 0.506818
Column 1.0 0.702561 0.702561 0.602508 0.439531
Residual 96.0 111.941964 1.166062 NaN NaN
And pulling out the p values like this:
tab["PR(>F)"][0]
Out[8]: 0.5776056586929655
Upvotes: 1