Reputation: 113
I have 3 dataframes with similar dimensions with the following structure:
> str(Results_first_experiment)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 30 obs. of 8 variables:
$ Strain: chr "1" "2" "3" "4" ...
$ 0.5 : num 3452 4126 2200 3125 1392 ...
$ 1 : num 11918 14445 7899 11735 5813 ...
$ 2 : num 19848 20872 16089 19759 13746 ...
$ 3 : num 20188 19937 20509 21012 19792 ...
$ 4 : num 16586 17074 15426 14748 15470 ...
$ 5 : num 16850 17288 17801 14051 17305 ...
$ 6 : num 12816 14682 16325 15948 16069 ...
> head (Results_first_experiment)
# A tibble: 6 x 8
Strain `0.5` `1` `2` `3` `4` `5` `6`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 3452. 11918. 19848. 20188. 16586. 16850. 12816
2 2 4126 14445. 20872. 19937. 17074. 17288 14682.
3 3 2200. 7899. 16089. 20509 15426. 17801 16325.
4 4 3125. 11735. 19758. 21012. 14748 14051. 15948.
5 5 1392. 5813. 13746 19792. 15470. 17305. 16069.
6 6 1501 5769 12730 18339. 17369 18645. 18463.
I would like to retrieve the mean of the 3 data frames per position. Ive tried to have a look How to get mean, median, and other statistics over entire matrix, array or dataframe?. However I could not get the mean across the 3 dataframes per position
Any help would be appreciated.
Upvotes: 2
Views: 239
Reputation: 26
First of all, in a data table or matrix, I suggest that you do not name your columns with numbers only. If you work on positions, you can for example named 'p.1', 'pos.1', etc.
When you have repeated data.frames for objects (your positions here), you can use the tidyverse to easily work with. Here is a simple example that you can tranpose to your data:
size <- 5
vec.list <- vector("list", size)
position <- paste0("position.", 1:size)
for(i in 1:size){
a <- runif(5, 0, 1)
b <- rnorm(5, 2, 4)
c <- rnorm(5, 0.5, 1)
vec.list[[i]] <- data.frame(position, a, b, c)
}
vec.list
# unlist to get a data.frame and sort according to position
df.pos <- do.call(rbind.data.frame, vec.list) %>% arrange(position)
# use tidyr::nest() to nest your data by position
pos.nested <- df.pos %>% group_by(position) %>% nest()
# Then use purrr::map() functions to work on nested data
map(.x = pos.nested$data, .f = ~mean(.x$a, na.rm = T))
map(.x = pos.nested$data, .f = ~sd(.x$c, na.rm = T))
Upvotes: 0
Reputation: 72984
When data are all numerical, it is more common to use matrices instead of data frames, which can be put into an array (a multidimensional matrix). You may convert information from character column "Strain"
into row names and delete it. So calculations you're planning to do can be done most easily.
apply(A, 1:2, mean) # mean of superimposed cells
# X1 X2 X3 X4
# 1 2 5 8 11
# 2 3 6 9 12
# 3 4 7 10 13
To get the array, I' ll use this small data.frame-to-matrix converter helper function,
d2m <- function(d, excl=1) `rownames<-`(as.matrix(d[-excl]), d[, 1])
where excl=
excludes columns (here just 1
).
d2m
can be applied to all your data frames simultaneously using sapply
(a close relative of lappy
) with option simplify="array"
. To list your data frames you may use list(df1, df2, ...)
or use mget()
and list by pattern as I do.
A <- sapply(mget(ls(pattern="^Results")), d2m, simplify="array")
The array looks like this,
A
, , Results_first_experiment
X1 X2 X3 X4
1 1 4 7 10
2 2 5 8 11
3 3 6 9 12
, , Results_second_experiment
X1 X2 X3 X4
1 2 5 8 11
2 3 6 9 12
3 4 7 10 13
, , Results_third_experiment
X1 X2 X3 X4
1 3 6 9 12
2 4 7 10 13
3 5 8 11 14
where each matrix can be accessed individually using A[,,1]
, A[,,2]
,... .
Now all calculations can be done easily, try:
apply(A, 1:2, mean) # mean superimposed cells
apply(A, c(1, 3), mean) # row means each matrix
apply(A, c(2, 3), mean) # column means each matrix
apply(A, 1, mean) # mean superimposed rows
apply(A, 2, mean) # mean superimposed columns
apply(A, 3, mean) # entire mean of each matrix
mean(A) # mean of entire array
If you already work with matrices, they may be arrayed like so:
A <- array(c(m1, m2, m3), dim=c(3, 4, 3)) # Note: dim=c(nrow, ncol, nmat)
Data:
Results_first_experiment <- structure(list(St = c("1", "2", "3"), X1 = 1:3, X2 = 4:6, X3 = 7:9,
X4 = 10:12), class = "data.frame", row.names = c(NA, -3L))
Results_second_experiment <- structure(list(St = c("1", "2", "3"), X1 = c(2, 3, 4), X2 = c(5,
6, 7), X3 = c(8, 9, 10), X4 = c(11, 12, 13)), class = "data.frame", row.names = c(NA,
-3L))
Results_third_experiment <- structure(list(St = c("1", "2", "3"), X1 = c(3, 4, 5), X2 = c(6,
7, 8), X3 = c(9, 10, 11), X4 = c(12, 13, 14)), class = "data.frame", row.names = c(NA,
-3L))
Upvotes: 0
Reputation: 4150
Something like this should be extensible for more operations
library(tidyverse)
list_random_dfs <- list(mtcars,iris,women)
pipe_to_do <- . %>%
select_if(is.numeric) %>%
summarise_all(list(means = ~mean(.,na.rm = TRUE)))
mtcars %>% pipe_to_do()
#> mpg_means cyl_means disp_means hp_means drat_means wt_means qsec_means
#> 1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875
#> vs_means am_means gear_means carb_means
#> 1 0.4375 0.40625 3.6875 2.8125
list_random_dfs %>%
map(pipe_to_do)
#> [[1]]
#> mpg_means cyl_means disp_means hp_means drat_means wt_means qsec_means
#> 1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875
#> vs_means am_means gear_means carb_means
#> 1 0.4375 0.40625 3.6875 2.8125
#>
#> [[2]]
#> Sepal.Length_means Sepal.Width_means Petal.Length_means Petal.Width_means
#> 1 5.843333 3.057333 3.758 1.199333
#>
#> [[3]]
#> height_means weight_means
#> 1 65 136.7333
Created on 2020-01-07 by the reprex package (v0.3.0)
Upvotes: 0
Reputation: 389047
Since the first column is character I am assuming you want to ignore that column in all the dataframes. You can do
(df1[-1] + df2[-1] + df3[-1])/3
If there are multiple such dataframes put them in a list and use Reduce
list_df <- list(df1[-1], df2[-1], df3[-1])
Reduce("+", list_df) / length(list_df)
Or using pipes
library(magrittr)
library(purrr)
list_df %>% reduce(`+`) %>% divide_by(length(list_df))
Upvotes: 4