Jack Friend
Jack Friend

Reputation: 13

Multiple columns in categorical Chi-Square

I am working on a Chi-square analysis in R. I have many subjects, and a previously determined boolean values:

# header > species   is_reptile  is_animal  is_alive
# 1 >      lizard    yes          yes        yes    
# 2 >      snake     yes          yes        yes    
# 3 >      cat       no           yes        yes    
# 4 >      flower    no           no         yes

I want to perform a test (I believe a chi square, but I am not sure) to determine how each of these the previous-tests are linked.

I previously used this R code, however it does not seem to work with all the columns as I would like it

chisq.test(data$is_reptile, data$is_animal)

# > Pearson's Chi-squared test with Yates' continuity correction
# > data:  data$is_reptile and data$is_animal
# > X-squared = 0, df = 1, p-value = 1

Is there a test (chi_square(data, data$species)) that can show a table similar to a pearsons?

            is_reptile    is_animal    is_alive
is_reptile  1.0           0.05         0.5
is_animal   0.05          1.0          0.05
is_alive    0.5           0.05         1.0

Upvotes: 0

Views: 437

Answers (2)

jay.sf
jay.sf

Reputation: 72828

You may stack and table you data before chisq.test.

chisq.test(table(stack(dat[-1])))
#         Pearson's Chi-squared test
# 
# data:  table(stack(dat[-1]))
# X-squared = 0.68182, df = 2, p-value =
# 0.7111
# 
# Warning message:
# In chisq.test(table(stack(dat[-1]))) :
#   Chi-squared approximation may be incorrect

Using pipes (same result):

dat[-1] |>
  stack() |>
  table() |>
  chisq.test()

Note: Since you are not sure, if it is the right test for you, perhaps take a look at this related post on Cross Validated.


Data:

dat <- structure(list(species = c("lizard", "snake", "cat", "flower", 
"dinosaur"), is_reptile = c("yes", "yes", "no", "no", "yes"), 
    is_animal = c("yes", "yes", "yes", "no", "yes"), is_alive = c("yes", 
    "yes", "yes", "yes", "no")), class = "data.frame", row.names = c(NA, 
-5L))

Upvotes: 2

Rui Barradas
Rui Barradas

Reputation: 76412

Something like this?
Reshape the data to long format, table it and run the chi-squared test.

library(dplyr)

df1 %>%
  pivot_longer(-1) %>%
  select(-1) %>%
  table() -> tbl1

tbl1
#            value
#name         no yes
#  is_alive    0   4
#  is_animal   1   3
#  is_reptile  2   2

chisq.test(tbl1)
#
#   Pearson's Chi-squared test
#
#data:  tbl1
#X-squared = 2.6667, df = 2, p-value = 0.2636
#
#Warning message:  
#In chisq.test(tbl1) : Chi-squared approximation may be incorrect

Data

x <- "species   is_reptile  is_animal  is_alive
lizard    yes          yes        yes    
snake     yes          yes        yes    
cat       no           yes        yes    
flower    no           no         yes"

df1 <- read.table(textConnection(x), header = TRUE)

Upvotes: 0

Related Questions