DrPineapple
DrPineapple

Reputation: 349

Apply function to each cell across multiple dataframes in R

Say that I have N identical (same number of rows and columns) dataframes:

set.seed(2)
df1 <- data.frame(replicate(100,rnorm(100)))
df2 <- data.frame(replicate(100,rnorm(100)))
dfN <- data.frame(replicate(100,rnorm(100)))

And I want to apply a function (in this case t.test()) across each "cell" of N dataframes so that what returns is a separate dataframe that contains a t value for each cell test performed. Essentially, I want to take the first cell of each dataframe,

one <- df1[1,1]
two <- df2[1,1]
Nth <- dfN[1,1]

Perform a t.test() on those cells,

first.cell.each <- cbind.data.frame(one,two,Nth)
t.test(first.cell.each, mu=0)

And repeat that across all cells (in this case 10000).

edit: clarified

Upvotes: 1

Views: 565

Answers (2)

akrun
akrun

Reputation: 887098

We can create a matrix to store the output of p.value of t.test having the same dimensions of the individual datasets. Then, loop through the sequence of rows and columns, extract the elements from each of the datasets, concatenate, and do the t.test and assign the output to the same row/column index of 'res'.

res <- matrix(, ncol=100, nrow=100)
for(i in seq_len(nrow(df1))){
 for(j in seq_len(ncol(df1))){
  res[i,j] <- t.test(c(df1[i,j], df2[i,j], dfN[i,j]), mu = 0)$p.value

 }}

My code also returns a 100*100 matrix

str(res)
#num [1:100, 1:100] 0.629 0.5 0.131 0.769 0.348 ...

If there are many datasets, we can place it in a list, then convert it to an array and do the t.test using apply

lst <-  mget(paste0("df", c(1, 2, "N")))
ar1 <- array(unlist(lst), dim = c(dim(df1), length(lst)))
res2 <-  apply(aperm(ar1, c(3, 1, 2)), c(2,3), FUN = function(x) t.test(x, mu = 0)$p.value) 
str(res2)
# num [1:100, 1:100] 0.629 0.5 0.131 0.769 0.348 ...

Upvotes: 2

Zheyuan Li
Zheyuan Li

Reputation: 73285

Suppose you have all your data frames saved in a list datlst, this does the work

z <- matrix(tapply(unlist(datlst, use.names = FALSE),
                   rep(gl(prod(dim(datlst[[1]])), 1), length(datlst)),
                   FUN = function (u) t.test(u, mu = 0)$p.value),
            nrow = nrow(datlst[[1]]))

With your example data frames datlst <- list(df1, df2, dfN), my code successfully returns you a 100 * 100 matrix:

str(z)
# num [1:100, 1:100] 0.629 0.5 0.131 0.769 0.348 ...

Upvotes: 1

Related Questions