wesleysc352
wesleysc352

Reputation: 617

Calculate the average of specific vectors of various elements of a list() in R and convert to data.frame

I have a very large list() which has +2000 elements, where each element has two vectors (x and y) with different sizes between the elements of the list.

Example:

new_list<-list(data.frame(x = c(1,2,3),
                          y = c(3,4,5)),
               data.frame(x = c(3,2,2,2,3,8),
                          y = c(5,2,3,5,6,7)),
               data.frame(x = c(3,2,2,1,1),
                          y = c(5,2,3,3,2)))

I would like to average only the x vectors in this list to get something like this:

df_mean<-data.frame(x=c(2,3.333,1.8))

Upvotes: 0

Views: 98

Answers (6)

jblood94
jblood94

Reputation: 16981

Using data.table's rbindlist:

data.table::rbindlist(new_list, idcol = TRUE)[, .(x = mean(x)), .id][, 2]
#>           x
#> 1: 2.000000
#> 2: 3.333333
#> 3: 1.800000

Upvotes: 0

Peter H.
Peter H.

Reputation: 2164

In this relatively simple case, I think I would prefer the more concise solution suggested by @quinten. However, if you need to calculate more statistics on the nested data frames, you could consider something like this:

library(tidyverse)

tibble(data = new_list) |> 
  rowwise() |> 
  summarise(
    x = mean(data$x)
  )
#> # A tibble: 3 × 1
#>       x
#>   <dbl>
#> 1  2   
#> 2  3.33
#> 3  1.8

or alternatively

tibble(data = new_list) |> 
  rowwise() |> 
  summarise(
    data |> 
      summarise(x = mean(x))
  )
#> # A tibble: 3 × 1
#>       x
#>   <dbl>
#> 1  2   
#> 2  3.33
#> 3  1.8

Upvotes: 2

SteveM
SteveM

Reputation: 2301

Good answer by Quinten. I usually prefer to follow the KISS principle. Here is a format that I find syntactically simpler:

len <- length(new_list)
sapply(1:len, function(z) mean(new_list[[z]][[1]]))
[1] 2.000000 3.333333 1.800000

Upvotes: 2

Ma&#235;l
Ma&#235;l

Reputation: 51914

You can also enframe the list and do a mean by group:

library(dplyr) #1.1.0 or higher
library(tibble)
enframe(new_list) %>% 
  unnest(value) %>% 
  summarise(x = mean(x), .by = name)

#   name     x
#1     1  2   
#2     2  3.33
#3     3  1.8 

Upvotes: 1

Quinten
Quinten

Reputation: 41225

You could calculate the colMeans per column that has x with sapply like this:

data.frame(x = sapply(new_list, \(x) colMeans(x[grepl('x', names(x))])))
#>          x
#> 1 2.000000
#> 2 3.333333
#> 3 1.800000

@nicola suggested a better option like this (thanks!):

data.frame(x = sapply(new_list, \(x) mean(x$x)))
#>          x
#> 1 2.000000
#> 2 3.333333
#> 3 1.800000

Created on 2023-02-06 with reprex v2.0.2

Upvotes: 5

akrun
akrun

Reputation: 886948

Using map

library(purrr)
library(dplyr)
map_dfr(new_list, ~ .x %>% 
    summarise(x = mean(x)))
         x
1 2.000000
2 3.333333
3 1.800000

Upvotes: 2

Related Questions