ghs101
ghs101

Reputation: 113

Mean of 3 dataframes with same dimensions

I have 3 dataframes with similar dimensions with the following structure:

> str(Results_first_experiment)

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   30 obs. of  8 variables:
 $ Strain: chr  "1" "2" "3" "4" ...
 $ 0.5   : num  3452 4126 2200 3125 1392 ...
 $ 1     : num  11918 14445 7899 11735 5813 ...
 $ 2     : num  19848 20872 16089 19759 13746 ...
 $ 3     : num  20188 19937 20509 21012 19792 ...
 $ 4     : num  16586 17074 15426 14748 15470 ...
 $ 5     : num  16850 17288 17801 14051 17305 ...
 $ 6     : num  12816 14682 16325 15948 16069 ...  

> head (Results_first_experiment)
# A tibble: 6 x 8
  Strain `0.5`    `1`    `2`    `3`    `4`    `5`    `6`
  <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 1      3452. 11918. 19848. 20188. 16586. 16850. 12816 
2 2      4126  14445. 20872. 19937. 17074. 17288  14682.
3 3      2200.  7899. 16089. 20509  15426. 17801  16325.
4 4      3125. 11735. 19758. 21012. 14748  14051. 15948.
5 5      1392.  5813. 13746  19792. 15470. 17305. 16069.
6 6      1501   5769  12730  18339. 17369  18645. 18463.

I would like to retrieve the mean of the 3 data frames per position. Ive tried to have a look How to get mean, median, and other statistics over entire matrix, array or dataframe?. However I could not get the mean across the 3 dataframes per position

Any help would be appreciated.

Upvotes: 2

Views: 239

Answers (4)

tgoronflot
tgoronflot

Reputation: 26

First of all, in a data table or matrix, I suggest that you do not name your columns with numbers only. If you work on positions, you can for example named 'p.1', 'pos.1', etc.

When you have repeated data.frames for objects (your positions here), you can use the tidyverse to easily work with. Here is a simple example that you can tranpose to your data:

size <- 5
vec.list <- vector("list", size)
position <- paste0("position.", 1:size)

for(i in 1:size){
  a <- runif(5, 0, 1)
  b <- rnorm(5, 2, 4)
  c <- rnorm(5, 0.5, 1)
  vec.list[[i]] <- data.frame(position, a, b, c)
}
vec.list

# unlist to get a data.frame and sort according to position
df.pos <- do.call(rbind.data.frame, vec.list) %>% arrange(position)

# use tidyr::nest() to nest your data by position
pos.nested <- df.pos %>% group_by(position) %>% nest()

# Then use purrr::map() functions to work on nested data
map(.x = pos.nested$data, .f = ~mean(.x$a, na.rm = T))
map(.x = pos.nested$data, .f = ~sd(.x$c, na.rm = T))

Upvotes: 0

jay.sf
jay.sf

Reputation: 72984

When data are all numerical, it is more common to use matrices instead of data frames, which can be put into an array (a multidimensional matrix). You may convert information from character column "Strain" into row names and delete it. So calculations you're planning to do can be done most easily.

apply(A, 1:2, mean)  # mean of superimposed cells 
#   X1 X2 X3 X4
# 1  2  5  8 11
# 2  3  6  9 12
# 3  4  7 10 13

To get the array, I' ll use this small data.frame-to-matrix converter helper function,

d2m <- function(d, excl=1) `rownames<-`(as.matrix(d[-excl]), d[, 1])

where excl= excludes columns (here just 1).

d2m can be applied to all your data frames simultaneously using sapply (a close relative of lappy) with option simplify="array". To list your data frames you may use list(df1, df2, ...) or use mget() and list by pattern as I do.

A <- sapply(mget(ls(pattern="^Results")), d2m, simplify="array")

The array looks like this,

A
, , Results_first_experiment

  X1 X2 X3 X4
1  1  4  7 10
2  2  5  8 11
3  3  6  9 12

, , Results_second_experiment

  X1 X2 X3 X4
1  2  5  8 11
2  3  6  9 12
3  4  7 10 13

, , Results_third_experiment

  X1 X2 X3 X4
1  3  6  9 12
2  4  7 10 13
3  5  8 11 14

where each matrix can be accessed individually using A[,,1], A[,,2],... .

Now all calculations can be done easily, try:

apply(A, 1:2, mean)  # mean superimposed cells
apply(A, c(1, 3), mean)  # row means each matrix
apply(A, c(2, 3), mean)  # column means each matrix
apply(A, 1, mean)  # mean superimposed rows
apply(A, 2, mean)  # mean superimposed columns
apply(A, 3, mean)  # entire mean of each matrix
mean(A)  # mean of entire array

If you already work with matrices, they may be arrayed like so:

A <- array(c(m1, m2, m3), dim=c(3, 4, 3))  # Note: dim=c(nrow, ncol, nmat)

Data:

Results_first_experiment <- structure(list(St = c("1", "2", "3"), X1 = 1:3, X2 = 4:6, X3 = 7:9, 
    X4 = 10:12), class = "data.frame", row.names = c(NA, -3L))

Results_second_experiment <- structure(list(St = c("1", "2", "3"), X1 = c(2, 3, 4), X2 = c(5, 
6, 7), X3 = c(8, 9, 10), X4 = c(11, 12, 13)), class = "data.frame", row.names = c(NA, 
-3L))

Results_third_experiment <- structure(list(St = c("1", "2", "3"), X1 = c(3, 4, 5), X2 = c(6, 
7, 8), X3 = c(9, 10, 11), X4 = c(12, 13, 14)), class = "data.frame", row.names = c(NA, 
-3L))

Upvotes: 0

Bruno
Bruno

Reputation: 4150

Something like this should be extensible for more operations

library(tidyverse)

list_random_dfs <- list(mtcars,iris,women)

pipe_to_do <- . %>% 
  select_if(is.numeric) %>% 
  summarise_all(list(means = ~mean(.,na.rm = TRUE)))

mtcars %>% pipe_to_do()
#>   mpg_means cyl_means disp_means hp_means drat_means wt_means qsec_means
#> 1  20.09062    6.1875   230.7219 146.6875   3.596563  3.21725   17.84875
#>   vs_means am_means gear_means carb_means
#> 1   0.4375  0.40625     3.6875     2.8125

list_random_dfs %>% 
  map(pipe_to_do)
#> [[1]]
#>   mpg_means cyl_means disp_means hp_means drat_means wt_means qsec_means
#> 1  20.09062    6.1875   230.7219 146.6875   3.596563  3.21725   17.84875
#>   vs_means am_means gear_means carb_means
#> 1   0.4375  0.40625     3.6875     2.8125
#> 
#> [[2]]
#>   Sepal.Length_means Sepal.Width_means Petal.Length_means Petal.Width_means
#> 1           5.843333          3.057333              3.758          1.199333
#> 
#> [[3]]
#>   height_means weight_means
#> 1           65     136.7333

Created on 2020-01-07 by the reprex package (v0.3.0)

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389047

Since the first column is character I am assuming you want to ignore that column in all the dataframes. You can do

(df1[-1] + df2[-1] + df3[-1])/3

If there are multiple such dataframes put them in a list and use Reduce

list_df <- list(df1[-1], df2[-1], df3[-1])
Reduce("+", list_df) / length(list_df)

Or using pipes

library(magrittr)
library(purrr)

list_df %>% reduce(`+`) %>% divide_by(length(list_df))

Upvotes: 4

Related Questions