Reputation: 35
I would like to create contingency tables and run chisq.test() etc. for multiple items in a dataframe.
Various attempts have resulted in 'Error in table(y$x, y$q2) : all arguments must have the same length'.
I think the example below focuses on my central problem, though ultimately I'd write a more complex function. I'd be interested in solutions to my specific function or to my overall approach. Thanks!
my_df <- structure(list(q1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
.Label = c("Choice1", "Choice2"),
class = "factor"),
q2 = structure(c(1L, 1L, 4L, 5L, 4L, 1L, 1L, 4L),
.Label = c("Agree", "Disagree","N/A or No Opinion",
"Strongly Agree", "Strongly Disagree"),
class = "factor"),
q3 = structure(c(1L, 4L, 1L, 4L, 1L, 4L, 4L, 4L),
.Label = c("Agree", "Disagree","N/A or No Opinion",
"Strongly Agree", "Strongly Disagree"),
class = "factor")),
row.names = c(NA, -8L),
class = c("tbl_df", "tbl", "data.frame"))
my_fn <- function(x, y) {
table(y$x, y$`q2`)
}
my_fn(names(my_df)[1], my_df)
#Error in table(y$x, y$q2) : all arguments must have the same length
lapply(names(my_df), my_fn, my_df)
#Error in table(y$x, y$q2) : all arguments must have the same length
Upvotes: 1
Views: 632
Reputation: 887048
We can use count
from dplyr
. It would get the data in a tibble format
library(dplyr)
library(purrr)
my_fn <- function(data, col1) {data %>%
count(!! rlang::sym(col1), q2)}
map(names(my_df), ~ my_fn(my_df, .x))
#[[1]]
# A tibble: 3 x 3
# q1 q2 n
# <fct> <fct> <int>
#1 Choice1 Agree 4
#2 Choice1 Strongly Agree 3
#3 Choice1 Strongly Disagree 1
#[[2]]
# A tibble: 3 x 2
# q2 n
# <fct> <int>
#1 Agree 4
#2 Strongly Agree 3
#3 Strongly Disagree 1
#[[3]]
# A tibble: 5 x 3
# q3 q2 n
# <fct> <fct> <int>
#1 Agree Agree 1
#2 Agree Strongly Agree 2
#3 Strongly Agree Agree 3
#4 Strongly Agree Strongly Agree 1
#5 Strongly Agree Strongly Disagree 1
Upvotes: 0
Reputation: 39595
Try this. The issue might be connected to the use of $
for variables. In case you want to use names, it is better if you use [[]]
so that the strings for names can be understood by the function. Here the code, with slight changes to your function. I added some examples:
#Function
my_fn <- function(x, y) {
table(y[[x]], y[['q2']])
}
#Code
my_fn('q1', my_df)
lapply(names(my_df),my_fn,y=my_df)
Output:
[[1]]
Agree Disagree N/A or No Opinion Strongly Agree Strongly Disagree
Choice1 4 0 0 3 1
Choice2 0 0 0 0 0
[[2]]
Agree Disagree N/A or No Opinion Strongly Agree Strongly Disagree
Agree 4 0 0 0 0
Disagree 0 0 0 0 0
N/A or No Opinion 0 0 0 0 0
Strongly Agree 0 0 0 3 0
Strongly Disagree 0 0 0 0 1
[[3]]
Agree Disagree N/A or No Opinion Strongly Agree Strongly Disagree
Agree 1 0 0 2 0
Disagree 0 0 0 0 0
N/A or No Opinion 0 0 0 0 0
Strongly Agree 3 0 0 1 1
Strongly Disagree 0 0 0 0 0
Upvotes: 1