Reputation: 1598
(Disclaimer: With apologies, I did ask a similar question previously. It did not get any answer, and it was closed.)
I would like to create 2-way contingency tables of different sizes, say, from 3x3 to 10x15, where some should show significant association (using chisq.test()
or similar) and some don't. I've ran to bits-and-peaces of potentially relevant posts, but I do not see the way to connect all the dots. For example, there is this post that discusses how to create random 2-way tables with r2dtable()
. Next, there are posts about generating random integers that sum-up to a particular value, here and here, which could be useful to define row and column marginals for r2dtable()
.
Nevertheless, it escapes me how to generate a list of such tables. Also, it seems that r2dtable()
always return tables that show no association. I suppose this is to be expected given that the tables are random.
Can anyone help, please?
Upvotes: 0
Views: 576
Reputation: 12461
The missing piece of information in your question is how to define the association - or lack-of-asociation - in your tables. That's going to be a case specific part of any generic solution.
I assume that the "table" you want to end up analysing consists of summarised data, classified by two factors.
generateData <- function(nRow, nCol, f, ...) {
df <- tibble() %>%
expand(
Row=1:nRow,
Col=1:nCol
)
df <- df %>%
f(...) %>%
pivot_wider(
names_from=Col,
values_from=Value,
names_prefix="Col"
)
return(df)
}
Here, nCol
and nRow
have the obvious meanings and f
is a function that has to be defined and which populates a column named Value
in a long tibble with columns named Row
and Col
. The elipsis ...
, allows you to pass arbitrary additional arguments to f
if needed.
To generate a table with no association between either rows and columns, simply fill Value
with random data. For example:
randomCells <- function(df, ...){
df %>% mutate(Value=5 + floor(runif(df %>% nrow(), max=10)))
}
So that
x <- generateRawData(3, 5, randomCells)
x
# A tibble: 3 x 6
Row Col1 Col2 Col3 Col4 Col5
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 9 11 11 14 8
2 2 13 11 12 5 14
3 3 8 11 14 13 10
and
chisq.test(as.matrix(x))
Pearson's Chi-squared test
data: as.matrix(x)
X-squared = 8.8907, df = 10, p-value = 0.5425
Now suppose you want a linear trend across columns, but no association between rows:
linearColumns <- function(df, ...){
df %>% mutate(Value=4*Col + floor(runif(df %>% nrow(), max=25)))
}
x <- generateRawData(3, 6, linearColumns)
x
# A tibble: 3 x 6
Row Col1 Col2 Col3 Col4 Col5
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 20 12 31 24 46
2 2 9 25 39 46 38
3 3 6 20 35 36 49
giving
chisq.test(as.matrix(x))
Pearson's Chi-squared test
data: as.matrix(x)
X-squared = 22.63, df = 10, p-value = 0.0122
You just need to define f
to give the pattern you want. In more complicated cases, it might be easier to define response at the level of the experimental unit and then aggregate the observed data to form your simmary data.
Apologies, I forgot to set.seed()
before generating my examples.
Upvotes: 1