Reputation: 508
I want to perform chi-square test of independence on the following dataset. The dataset consists of four categorical variables. The test is performed on two variables at a time with variable V4 fixed. Essentially, I want to perform chi-square for 3 combinations: V1-V4, V2-V4, and V3-V4. Now, I want to perform this in a loop since the actual analysis consists of operations over a large number of combinations.
V1 V2 V3 V4
A SUV Yes Good
A SUV No Good
B SUV No Good
B SUV Yes Satisfactory
C car Yes Excellent
C SUV No Poor
D SUV Yes Poor
D van Yes Satisfactory
E car No Excellent
What I have tried:
x <- c(1:3)
for (i in x) {
test <- chisq.test(df[, i], df[, 4])
out <- data.frame("X" = colnames(df)[i]
, "Y" = colnames(df[4])
, "Chi.Square" = round(test$statistic,3)
, "df"= test$parameter
, "p.value" = round(test$p.value, 3)
)
return(out)
}
However, I only receive the output for V1-V4 combination. Reference for code: Chi Square Analysis using for loop in R
Upvotes: 2
Views: 1007
Reputation: 887951
out
is getting replaced in each iteration with the current output and the result OP got is from the last iteration. We can initialize with a list
with length
of 'x' to store the output
x <- 1:3
out <- vector('list', length(x))
for (i in x) {
test <- chisq.test(df[, i], df[, 4])
out[[i]] <- data.frame("X" = colnames(df[i]),
"Y" = colnames(df[4]),
"Chi.Square" = round(test$statistic, 3),
"df" = test$parameter,
"p.value" = round(test$p.value, 3))
}
Upvotes: 3
Reputation: 389325
You can use lapply
to perform this loop.
x <- 1:3
do.call(rbind, lapply(x, function(i) {
test <- chisq.test(df[, i], df[, 4])
data.frame("X" = colnames(df)[i],
"Y" = colnames(df[4]),
"Chi.Square" = round(test$statistic,3),
"df"= test$parameter,
"p.value" = round(test$p.value, 3))
})) -> out
rownames(out) <- NULL
out
# X Y Chi.Square df p.value
#1 V1 V4 14.25 12 0.285
#2 V2 V4 12.75 6 0.047
#3 V3 V4 2.25 3 0.522
Upvotes: 1