maja zaloznik
maja zaloznik

Reputation: 688

Sampling from a contingency table

I've managed as far as the code below in writing a function to sample from a contingency table - proportional to the frequencies in the cells.

It uses expand.grid and then table to get back to the original size table. Which works fine as long as the sample size is large enough that some categories are not completely missing. Otherwise the table command returns a table that is of smaller dimensions than the original one.

FunSample<- function(Full, n) {
  Frame <- expand.grid(lapply(dim(Full), seq))
  table(Frame[sample(1:nrow(Frame), n, prob = Full, replace = TRUE), ])
}
Full<-array(c(1,2,3,4), dim=c(2,2,2))
FunSample(Full, 100) # OK
FunSample(Full, 1) # not OK, I want it to still have dim=c(2,2,2)!

My brain has stopped working, I know it has to be a small tweak to get it back on track!?

Upvotes: 2

Views: 818

Answers (3)

Neal Fultz
Neal Fultz

Reputation: 9686

A crosstab is also a multinomial distribution, so you can use rmultinom and reset the dimension on the output. This should give a substantial performance boost and cut down on the code you need to maintain.

> X <- rmultinom(1, 500, Full)
> dim(X) <- dim(Full)
> X
, , 1

     [,1] [,2]
[1,]   18   92
[2,]   45   92

, , 2

     [,1] [,2]
[1,]   28   72
[2,]   49  104

> X2 <-rmultinom(1, 4, Full)
> dim(X2) <- dim(Full)
> X2
, , 1

     [,1] [,2]
[1,]    0    1
[2,]    0    0

, , 2

     [,1] [,2]
[1,]    0    1
[2,]    1    1

Upvotes: 3

Aaron - mostly inactive
Aaron - mostly inactive

Reputation: 37804

You could use tabulate instead of table; it works on integer-valued vectors, as you have here. You could also get the output into an array by using array directly, just like when you created the original data.

FunSample<- function(Full, n) {
  samp <- sample(1:length(Full), n, prob = Full, replace = TRUE)
  array(tabulate(samp), dim=dim(Full))
}

Upvotes: 1

Prasad Chalasani
Prasad Chalasani

Reputation: 20282

If you don't want table() to "drop" missing combinations, you need to force the columns of Frame to be factors:

FunSample <- function(Full, n) {
  Frame <- as.data.frame( lapply( expand.grid(lapply(dim(Full), seq)), factor) )  
  table( Frame[sample(1:nrow(Frame), n, prob = Full, replace = TRUE), ])
}   

> dim( FunSample(Full, 1))
[1] 2 2 2
> dim( FunSample(Full, 100))
[1] 2 2 2

Upvotes: 3

Related Questions