Jack Karrde
Jack Karrde

Reputation: 121

Migrating ANOVA p-value function from R to Python

Good day!

Is there an effective way to find the p-values of a 4-Way ANVOA model in Python.

Something like this would have worked in R in a for loop for a bunch of simulations

pValues[k] <- anova(lm(Yield ~ Water + Row + Column, data=y))$"Pr(>F)"[1]

I've tried researchpy and have since moved onto statsmodels but I have no idea how to proceed from here...

pValues[k] = statsmodels.stats.anova_lm(data=y)."Pr(>F)"[1]

Upvotes: 1

Views: 361

Answers (1)

StupidWolf
StupidWolf

Reputation: 46898

In R:

set.seed(111)
y = data.frame(matrix(rnorm(400),100,4))
colnames(y) = c("Yield","Water","Row","Column")

anova(lm(Yield ~ Water + Row + Column, data=y))
Analysis of Variance Table

Response: Yield
          Df  Sum Sq Mean Sq F value Pr(>F)
Water      1   0.364 0.36410  0.3122 0.5776
Row        1   0.518 0.51768  0.4440 0.5068
Column     1   0.703 0.70256  0.6025 0.4395
Residuals 96 111.942 1.16606           

write.csv(y,"y_data.csv",quote=FALSE,row.names=FALSE)

In python, you can use the function anova_lm from stats in statsmodels to obtain the table:

import statsmodels.api as sm
from statsmodels.formula.api import ols
import pandas as pd

y = pd.read_csv("y_data.csv")

mod = ols('Yield ~ Water + Row + Column',data=y).fit()
tab = sm.stats.anova_lm(mod)

            df      sum_sq   mean_sq         F    PR(>F)
Water      1.0    0.364100  0.364100  0.312247  0.577606
Row        1.0    0.517678  0.517678  0.443954  0.506818
Column     1.0    0.702561  0.702561  0.602508  0.439531
Residual  96.0  111.941964  1.166062       NaN       NaN

And pulling out the p values like this:

tab["PR(>F)"][0]
Out[8]: 0.5776056586929655

Upvotes: 1

Related Questions