Reputation: 33
I am trying to run multiple chi-square tests in a dataset using group_by in tidyverse.
The dataset looks like this:
brand<- c("sunrise", "sunrise", "waterloo", "waterloo", "yankin", "yankin")
generation <- c("90s", "80s", "90s", "80s", "90s", "80s")
positive_opinion <- c(23, 50, 65, 20, 34, 36)
negative_opinion <- c(85, 40, 30, 90, 52, 53)
db <- data.frame(brand, generation, positive_opinion, negative_opinion)
I want to see if there is any difference on the opinions between generations for each brand.
I am able to do this using the codes below
db %>%
filter(brand == "sunrise") %>%
select(positive_opinion, negative_opinion) %>%
chisq.test()
However, I have many brands. So I want the codes to give me all the p values for all brands. I have tried this but it does not work
db %>%
group_by(brand) %>%
select(positive_opinion, negative_opinion) %>%
chisq.test()
The error message is Error in sum(x) : invalid 'type' (character) of argument
.
It seems the logic is right, but I don't know what causes the error. Any comment is welcome.
Thank you all!
Upvotes: 0
Views: 2297
Reputation: 33
I finally figured this out myself. I am sure the codes can be improved, but at least it does the work now.
db %>%
select(-c(generation)) %>%
group_by(brand) %>%
nest() %>%
mutate(chisq_p = map_dbl(data, ~chisq.test(.)$p.value))
The codes above give the right P values.
Upvotes: 2
Reputation: 2670
it would be different in your chisq_test()
, you can use dplyr's summarise()
function to run some code for each group. I used base chisq.test()
db %>%
group_by(brand) %>%
summarise(test_result=chisq.test(positive_opinion,negative_opinion)$p.value)
Upvotes: 0