simpson
simpson

Reputation: 335

Column means of nested lists

I am trying to compute the column means only for the numeric columns in my nested list and return my nested list with the same structure including non-numeric columns.

Here is my example list:

#Create dataframes ABC and DEF, where 'left' and 'right' columns are numeric and the 'up' column contains characters
ABC <- cbind.data.frame(left = c(2, 3, 5), right = c(5, 8, 4) , up = c("aa","aa", "aa"))
ABC$up <- as.character(ABC$up)
DEF <-  cbind.data.frame(left = c(7, 2, 9), right = c(3, 6, 1) , up = c("bb","bb", "bb"))
DEF$up <- as.character(DEF$up)

#Create a list called mylist, containing two dataframes: ABC and DEF
mylist <- list(ABC = ABC, DEF = DEF)
$ABC
#  left  right  up 
#1  2     5     aa  
#2  3     8     aa
#3  5     4     aa  

$DEF
#  left  right  up 
#1  7     3     bb  
#2  2     6     bb
#3  9     1     bb

I would like mylist of column means to look like this:

perfect.col.means.list
$ABC
#  left  right  up 
#1  3.33  5.66  aa  

$DEF
#  left  right  up 
#1  6     3.33  bb  

I have tried:

 means.by.col <- lapply(mylist, function(x) {
  as.data.frame(lapply(x, function(y) if(is.numeric(y)) colMeans(y) else y))})

And this returns an error: Error in colMeans(y) : 'x' must be an array of at least two dimensions

I have also tried:

means.by.col <- lapply(mylist, function(x) {
  as.data.frame(rapply(x, classes = "numeric", f = colMeans, how = "replace"))})

And this returns a similar error: Error in (function (x, na.rm = FALSE, dims = 1L): 'x' must be an array of at least two dimensions

I have tried just calculating the mean, which should work, but it replaces all values in the column with the mean rather than returning a single column mean value.

means.only <- lapply(mylist, function(x) {
    as.data.frame(lapply(x, function(y) if(is.numeric(y)) mean(y) else y))})
> means.only
$ABC
      left    right up
1 3.333333 5.666667 aa
2 3.333333 5.666667 aa
3 3.333333 5.666667 aa

$DEF
  left    right up
1    6 3.333333 bb
2    6 3.333333 bb
3    6 3.333333 bb

Any suggestions?

Upvotes: 1

Views: 376

Answers (1)

akrun
akrun

Reputation: 887213

We can loop over the list with map, do a group by 'up' and summarise across the numeric columns

library(dplyr)
library(purrr)
map(mylist, ~ .x %>%
      group_by(up) %>% 
       summarise(across(where(is.numeric), mean), .groups = 'drop'))

-output

#$ABC
# A tibble: 1 x 3
#  up     left right
#  <chr> <dbl> <dbl>
#1 aa     3.33  5.67

#$DEF
# A tibble: 1 x 3
#  up     left right
#  <chr> <dbl> <dbl>
#1 bb        6  3.33

Or with base R

lapply(mylist, function(x) data.frame(as.list(colMeans(x[1:2])), up = x$up[1]))

In the OP's code, we could get the first value of 'up'

lapply(mylist, function(x) data.frame(lapply(x, 
         function(y) if(is.numeric(y)) mean(y) else y[1])))

#$ABC
#      left    right up
#1 3.333333 5.666667 aa

#$DEF
#  left    right up
#1    6 3.333333 bb

Upvotes: 1

Related Questions