vp_050
vp_050

Reputation: 508

Fix a column in for loop while doing Chi-square test

I want to perform chi-square test of independence on the following dataset. The dataset consists of four categorical variables. The test is performed on two variables at a time with variable V4 fixed. Essentially, I want to perform chi-square for 3 combinations: V1-V4, V2-V4, and V3-V4. Now, I want to perform this in a loop since the actual analysis consists of operations over a large number of combinations.

V1  V2  V3  V4
A   SUV Yes Good
A   SUV No  Good
B   SUV No  Good
B   SUV Yes Satisfactory
C   car Yes Excellent
C   SUV No  Poor
D   SUV Yes Poor
D   van Yes Satisfactory
E   car No  Excellent

What I have tried:

x <- c(1:3)
for (i in x) {
  test <- chisq.test(df[, i], df[, 4])
  out <- data.frame("X" = colnames(df)[i]
                    , "Y" = colnames(df[4])
                    , "Chi.Square" = round(test$statistic,3)
                    ,  "df"= test$parameter
                    ,  "p.value" = round(test$p.value, 3)
  )
  return(out)
}

However, I only receive the output for V1-V4 combination. Reference for code: Chi Square Analysis using for loop in R

Upvotes: 2

Views: 1007

Answers (2)

akrun
akrun

Reputation: 887951

out is getting replaced in each iteration with the current output and the result OP got is from the last iteration. We can initialize with a list with length of 'x' to store the output

x <- 1:3
out <- vector('list', length(x))
for (i in x) {
  test <- chisq.test(df[, i], df[, 4])
  out[[i]] <- data.frame("X" = colnames(df[i]),
                         "Y" = colnames(df[4]),
                         "Chi.Square" = round(test$statistic, 3),
                         "df" = test$parameter,
                         "p.value" = round(test$p.value, 3))
  
 }

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 389325

You can use lapply to perform this loop.

x <- 1:3

do.call(rbind, lapply(x, function(i) {
  test <- chisq.test(df[, i], df[, 4])
  data.frame("X" = colnames(df)[i], 
             "Y" = colnames(df[4]), 
             "Chi.Square" = round(test$statistic,3),  
             "df"= test$parameter,  
             "p.value" = round(test$p.value, 3))
})) -> out
rownames(out) <- NULL
out

#   X  Y Chi.Square df p.value
#1 V1 V4      14.25 12   0.285
#2 V2 V4      12.75  6   0.047
#3 V3 V4       2.25  3   0.522

Upvotes: 1

Related Questions