Function for running a linear model and anova for several variables and collect the p-values in a data frame

Question

From the variables pointA, pointB, pointC run in a linear model and anova I would like to collect the p-values in a data frame. In my real data set there are around 30 variables, so I'm looking for a function running through these and collecting the p-values in a data frame. A more efficient way than running through each and manually putting them together in a data frame.

set.seed(1)
id <- rep(1:3,each=4)
trt <- rep(c("A","OA", "B", "OB"),3)
pointA <- sample(1:10,12, replace=TRUE)
pointB<- sample(1:10,12, replace=TRUE)
pointC<- sample(1:10,12, replace=TRUE)
df <- data.frame(id,trt,pointA, pointB,pointC)
df

id trt pointA pointB pointC
1   1   A      8      8     10
2   1  OA      2      7      3
3   1   B      8      5      5
4   1  OB      5      9      4
5   2   A      9      5      7
6   2  OA      7      3      3
7   2   B      8      1      5
8   2  OB      6      1      8
9   3   A      6      4      1
10  3  OA      8      6      9
11  3   B      1      7      4
12  3  OB      5      5      9


data <- function(i){
  lmdf <- lm(df[,i]~trt, data=df)
  anv <- anova(lmdf)
  pvalue <- anv$`Pr(>F)`
  return(pvalue)
  }
data(5)

I would like it to look something like:

 variable pvalue
1   pointA  0.714
2   pointB  0.949
3   pointC  0.080

George Savva · Accepted Answer

You can treat the columns of a dataframe as a list, and use sapply to iterate over the columns you want to use as outcomes.

Getting the p-value out of each ANOVA is a bit fiddly, maybe somebody knows a better way, but this works:

pvalues <- sapply( df[,3:5] , FUN = function(x) 
  summary(aov(x~df$trt))[[1]]$`Pr(>F)`[1]  
  )
data.frame(pvalues)

         pvalues
pointA 0.9737895
pointB 0.2482931
pointC 0.7660808

I did think about using a multivariate regression, but it was no easier to get the p-values for each outcome variable that way.

Function for running a linear model and anova for several variables and collect the p-values in a data frame

Answers (2)

Related Questions