Phil_T
Phil_T

Reputation: 1002

generate all possible contingency tables from two dataframes r

Having a brain fart today, I'm hoping this is an obvious fix that I'm missing. I have two dataframes, one is a dataframe of endpoints, the other is a dataframe of risk factors. I am going to calculate all the risk ratios for all possible combinations of risk factors to all possible outcomes. I am in need of a function that will generate all possible contingency tables from the two dataframes. It would be nice to have a fame work that allows me to input some stats functions as part of the contingency table function.

example data:

a = c(1,0,1,1,1)
b = c(0,1,1,0,0)
c = c(1,1,0,0,1)
d = c(0,0,0,1,1)

risk = data.frame(a,b)
endpoint = data.frame(c,d)

Again, if you can piece this together as a loop that will allow me to compute stats as the contingency tables are created, I would appreciate that. It would allow me to copy/paste my existing code into the function.

Thanks

Upvotes: 0

Views: 584

Answers (1)

acylam
acylam

Reputation: 18661

It is not entirely clear what "contingency table" you are trying to create, but the following gives you the table output for all combinations of risk and endpoint:

lapply(data.frame(t(expand.grid(names(risk), names(endpoint), 
                                stringsAsFactors = FALSE)), stringsAsFactors = FALSE), 
       function(x) table(risk[[x[1]]], endpoint[[x[2]]], dnn = x))

Note that there are two stringsAsFactors=FALSE, one for expand.grid, another for data.frame, since both functions implicitly convert characters to factors, which is not desirable for table. To simplify the above code, you can use the tidyverse equivalent:

map(as.tibble(t(expand.grid(names(risk), names(endpoint), 
                            stringsAsFactors = FALSE))), 
    ~ table(risk[[.[1]]], endpoint[[.[2]]], dnn = .))

Result:

$X1
   c
a   0 1
  0 0 1
  1 2 2

$X2
   c
b   0 1
  0 1 2
  1 1 1

$X3
   d
a   0 1
  0 1 0
  1 2 2

$X4
   d
b   0 1
  0 1 2
  1 2 0

Upvotes: 1

Related Questions