Yunpeng Ding
Yunpeng Ding

Reputation: 33

R - Chi square test using "group_by" in tidyverse

I am trying to run multiple chi-square tests in a dataset using group_by in tidyverse.

The dataset looks like this:

brand<- c("sunrise", "sunrise", "waterloo", "waterloo", "yankin", "yankin")
generation <- c("90s", "80s", "90s", "80s", "90s", "80s")
positive_opinion <- c(23, 50, 65, 20, 34, 36)
negative_opinion <- c(85, 40, 30, 90, 52, 53)
db <- data.frame(brand, generation, positive_opinion, negative_opinion)

I want to see if there is any difference on the opinions between generations for each brand.

I am able to do this using the codes below

db %>% 
  filter(brand == "sunrise") %>% 
  select(positive_opinion, negative_opinion) %>% 
  chisq.test()

However, I have many brands. So I want the codes to give me all the p values for all brands. I have tried this but it does not work

db %>% 
  group_by(brand) %>% 
  select(positive_opinion, negative_opinion) %>% 
  chisq.test()

The error message is Error in sum(x) : invalid 'type' (character) of argument.

It seems the logic is right, but I don't know what causes the error. Any comment is welcome.

Thank you all!

Upvotes: 0

Views: 2297

Answers (2)

Yunpeng Ding
Yunpeng Ding

Reputation: 33

I finally figured this out myself. I am sure the codes can be improved, but at least it does the work now.

db %>%
  select(-c(generation)) %>% 
  group_by(brand) %>% 
  nest() %>% 
  mutate(chisq_p = map_dbl(data, ~chisq.test(.)$p.value))

The codes above give the right P values.

Upvotes: 2

Samet S&#246;kel
Samet S&#246;kel

Reputation: 2670

it would be different in your chisq_test(), you can use dplyr's summarise() function to run some code for each group. I used base chisq.test()

db %>%
group_by(brand) %>%
summarise(test_result=chisq.test(positive_opinion,negative_opinion)$p.value)

Upvotes: 0

Related Questions