Reputation: 51
I want to do a chi-square test
on my dependent variable and each of the 90 independent variables and return a list of the names of the independent variables that have a p.value > 0.05
. I tried a for
loop but it's not workin. Can someone help me, please?
c=numeric(ncol(datam))
for(i in 2:ncol(datam)){
a[i]=table(datam[,1], datam[,i])
b[i]=chisq.test(a[i])
if(b[i]$p.value>0.05) c=b[i]$data.name + c
}
c
Upvotes: 0
Views: 81
Reputation: 3597
You can try this
p_values <- sapply(2:ncol(datam),function(x) chisq.test(datam[,1],datam[,x])$p.value)
selected_variables <- names(datam)[-1][p_values>0.5]
selected_variables
Upvotes: 1
Reputation: 17279
I'd recommend the broom
package for making your life a little easier. They key is to generate a data frame of your results that you can use to filter for the variables you find interesting.
library(broom)
set.seed(pi)
DF <- data.frame(x = factor(sample(LETTERS[1:4], 50, replace = TRUE, prob = c(1, 1, 1, 4))),
y1 = factor(sample(LETTERS[1:4], 50, replace = TRUE)),
y2 = factor(sample(LETTERS[1:4], 50, replace = TRUE)),
y3 = factor(sample(LETTERS[1:4], 50, replace = TRUE, prob = c(4, 1, 1, 1))),
y4 = factor(sample(LETTERS[1:4], 50, replace = TRUE)))
Results <- do.call(
"rbind",
lapply(names(DF)[-1],
function(nm)
{
x <- chisq.test(DF[, 1], DF[[nm]])
x <- tidy(x)
x$name = nm
x
}
)
)
Results[Results$p.value <= 0.05, ]
Upvotes: 3