How can I write a function to create multiple tables from dataframe columns?

Question

I would like to create contingency tables and run chisq.test() etc. for multiple items in a dataframe.

Various attempts have resulted in 'Error in table(y$x, y$q2) : all arguments must have the same length'.

I think the example below focuses on my central problem, though ultimately I'd write a more complex function. I'd be interested in solutions to my specific function or to my overall approach. Thanks!

my_df <- structure(list(q1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), 
                              .Label = c("Choice1", "Choice2"), 
                              class = "factor"), 
                        q2 = structure(c(1L, 1L, 4L, 5L, 4L, 1L, 1L, 4L), 
                              .Label = c("Agree", "Disagree","N/A or No Opinion", 
                                         "Strongly Agree", "Strongly Disagree"), 
                              class = "factor"),
                        q3 = structure(c(1L, 4L, 1L, 4L, 1L, 4L, 4L, 4L), 
                              .Label = c("Agree", "Disagree","N/A or No Opinion", 
                                    "Strongly Agree", "Strongly Disagree"), 
                              class = "factor")),
                   row.names = c(NA, -8L), 
                   class = c("tbl_df", "tbl", "data.frame"))

my_fn <- function(x, y) {
 table(y$x, y$`q2`)

}

my_fn(names(my_df)[1], my_df)
#Error in table(y$x, y$q2) : all arguments must have the same length

lapply(names(my_df), my_fn, my_df)
#Error in table(y$x, y$q2) : all arguments must have the same length

Duck · Accepted Answer

Try this. The issue might be connected to the use of $ for variables. In case you want to use names, it is better if you use [[]] so that the strings for names can be understood by the function. Here the code, with slight changes to your function. I added some examples:

#Function
my_fn <- function(x, y) {
  table(y[[x]], y[['q2']])
}
#Code
my_fn('q1', my_df)
lapply(names(my_df),my_fn,y=my_df)

Output:

[[1]]
         
          Agree Disagree N/A or No Opinion Strongly Agree Strongly Disagree
  Choice1     4        0                 0              3                 1
  Choice2     0        0                 0              0                 0

[[2]]
                   
                    Agree Disagree N/A or No Opinion Strongly Agree Strongly Disagree
  Agree                 4        0                 0              0                 0
  Disagree              0        0                 0              0                 0
  N/A or No Opinion     0        0                 0              0                 0
  Strongly Agree        0        0                 0              3                 0
  Strongly Disagree     0        0                 0              0                 1

[[3]]
                   
                    Agree Disagree N/A or No Opinion Strongly Agree Strongly Disagree
  Agree                 1        0                 0              2                 0
  Disagree              0        0                 0              0                 0
  N/A or No Opinion     0        0                 0              0                 0
  Strongly Agree        3        0                 0              1                 1
  Strongly Disagree     0        0                 0              0                 0

How can I write a function to create multiple tables from dataframe columns?

Answers (2)

Related Questions