Asma
Asma

Reputation: 51

Programming r using chi-squared test

I want to do a chi-square test on my dependent variable and each of the 90 independent variables and return a list of the names of the independent variables that have a p.value > 0.05. I tried a for loop but it's not workin. Can someone help me, please?

c=numeric(ncol(datam))
for(i in 2:ncol(datam)){
  a[i]=table(datam[,1], datam[,i])
  b[i]=chisq.test(a[i])
  if(b[i]$p.value>0.05) c=b[i]$data.name + c
  }
c

Upvotes: 0

Views: 81

Answers (2)

user2100721
user2100721

Reputation: 3597

You can try this

p_values <- sapply(2:ncol(datam),function(x) chisq.test(datam[,1],datam[,x])$p.value)
selected_variables <- names(datam)[-1][p_values>0.5]
selected_variables

Upvotes: 1

Benjamin
Benjamin

Reputation: 17279

I'd recommend the broom package for making your life a little easier. They key is to generate a data frame of your results that you can use to filter for the variables you find interesting.

library(broom)
set.seed(pi)
DF <- data.frame(x = factor(sample(LETTERS[1:4], 50, replace = TRUE, prob = c(1, 1, 1, 4))),
                 y1 = factor(sample(LETTERS[1:4], 50, replace = TRUE)),
                 y2 = factor(sample(LETTERS[1:4], 50, replace = TRUE)),
                 y3 = factor(sample(LETTERS[1:4], 50, replace = TRUE, prob = c(4, 1, 1, 1))),
                 y4 = factor(sample(LETTERS[1:4], 50, replace = TRUE)))

Results <- do.call(
  "rbind",
  lapply(names(DF)[-1],
         function(nm)
         {
           x <- chisq.test(DF[, 1], DF[[nm]])
           x <- tidy(x)
           x$name = nm
           x
         }
  )
)

Results[Results$p.value <= 0.05, ]

Upvotes: 3

Related Questions