Reputation: 13
I'm still new to R and data analytics in general. I have a data set containing 2 parts:
Here is a scaled down sample version of the data set (only contains 3 of the 20 questions and 3 socio-demographic variables) in case it is needed:
data.frame(Q1 = c(1, 2, 2, 1, 3, 4, 3, 5, 2, 2),
Q2 = c(2, 3, 5, 5, 4, 5, 1, 1, 5, 3),
Q3 = c(4, 4, 2, 3, 2, 1, 1, 1, 5, 5),
ageRange = c(2, 3, 1, 1, 3, 4, 4, 2, 1, 1),
education = c(1, 1, 3, 4, 6, 5, 3, 2, 1, 4),
maritalStatus = c(1, 0, 0, 0, 0, 1, 1, 0, 0, 1))
data.frame(Age = c(0, 0, 0),
Education = c(0, 0, 0),
Married = c(0, 0, 0), row.names = c("Q1", "Q2", "Q3"))
I tried using some of the apply functions, but I could not get it to work.
Upvotes: 0
Views: 484
Reputation: 887951
We may use a loop as well
library(purrr)
library(broom)
library(tidyr)
library(stringr)
library(dplyr)
str_subset(names(df), "^Q\\d+$") %>%
map(~ df %>%
select(all_of(.x), ageRange:maritalStatus) %>%
pivot_longer(cols = -1) %>%
group_by(ID = .x, name) %>%
summarise(stats = tidy(chisq.test(cur_data()[[1]], value)),
.groups = "drop")) %>%
list_rbind %>%
unnest(where(is_tibble))
-output
# A tibble: 9 × 6
ID name statistic p.value parameter method
<chr> <chr> <dbl> <dbl> <int> <chr>
1 Q1 ageRange 15.6 0.209 12 Pearson's Chi-squared test
2 Q1 education 27.5 0.122 20 Pearson's Chi-squared test
3 Q1 maritalStatus 2.71 0.608 4 Pearson's Chi-squared test
4 Q2 ageRange 15.6 0.209 12 Pearson's Chi-squared test
5 Q2 education 20.8 0.407 20 Pearson's Chi-squared test
6 Q2 maritalStatus 2.71 0.608 4 Pearson's Chi-squared test
7 Q3 ageRange 14.6 0.265 12 Pearson's Chi-squared test
8 Q3 education 21.7 0.359 20 Pearson's Chi-squared test
9 Q3 maritalStatus 3.06 0.549 4 Pearson's Chi-squared test
Upvotes: 1
Reputation: 79276
We could do something like this. This quite verbose, but for the start it may help:
What we do here is in principle create new data frames with each one of the Q columns and the others. And for each Q we do the same and bind them at the end.
Quite handy is the tidy
function from broom
package:
library(dplyr)
library(tidyr)
library(broom)
Q1 <- df %>%
select(-Q2, -Q3) %>%
pivot_longer(-Q1) %>%
group_by(name) %>%
nest(-name) %>%
mutate(stats = map(data, ~broom::tidy(chisq.test(.$Q1, .$value)))) %>%
select(-data) %>%
unnest(c(stats))
Q2 <- df %>%
select(-Q1, -Q3) %>%
pivot_longer(-Q2) %>%
group_by(name) %>%
nest(-name) %>%
mutate(stats = map(data, ~broom::tidy(chisq.test(.$Q2, .$value)))) %>%
select(-data) %>%
unnest(c(stats))
Q3 <- df %>%
select(-Q1, -Q2) %>%
pivot_longer(-Q3) %>%
group_by(name) %>%
nest(-name) %>%
mutate(stats = map(data, ~broom::tidy(chisq.test(.$Q3, .$value)))) %>%
select(-data) %>%
unnest(c(stats))
bind_rows(Q1, Q2, Q3, .id = "Q") %>%
mutate(ID = paste0("Q",Q), .before=1, .keep="unused")
ID name statistic p.value parameter method
<chr> <chr> <dbl> <dbl> <int> <chr>
1 Q1 ageRange 15.6 0.209 12 Pearson's Chi-squared test
2 Q1 education 27.5 0.122 20 Pearson's Chi-squared test
3 Q1 maritalStatus 2.71 0.608 4 Pearson's Chi-squared test
4 Q2 ageRange 15.6 0.209 12 Pearson's Chi-squared test
5 Q2 education 20.8 0.407 20 Pearson's Chi-squared test
6 Q2 maritalStatus 2.71 0.608 4 Pearson's Chi-squared test
7 Q3 ageRange 14.6 0.265 12 Pearson's Chi-squared test
8 Q3 education 21.7 0.359 20 Pearson's Chi-squared test
9 Q3 maritalStatus 3.06 0.549 4 Pearson's Chi-squared test
Upvotes: 1