Reputation: 617
I have a very large list() which has +2000 elements, where each element has two vectors (x and y) with different sizes between the elements of the list.
Example:
new_list<-list(data.frame(x = c(1,2,3),
y = c(3,4,5)),
data.frame(x = c(3,2,2,2,3,8),
y = c(5,2,3,5,6,7)),
data.frame(x = c(3,2,2,1,1),
y = c(5,2,3,3,2)))
I would like to average only the x
vectors in this list to get something like this:
df_mean<-data.frame(x=c(2,3.333,1.8))
Upvotes: 0
Views: 98
Reputation: 16981
Using data.table
's rbindlist
:
data.table::rbindlist(new_list, idcol = TRUE)[, .(x = mean(x)), .id][, 2]
#> x
#> 1: 2.000000
#> 2: 3.333333
#> 3: 1.800000
Upvotes: 0
Reputation: 2164
In this relatively simple case, I think I would prefer the more concise solution suggested by @quinten. However, if you need to calculate more statistics on the nested data frames, you could consider something like this:
library(tidyverse)
tibble(data = new_list) |>
rowwise() |>
summarise(
x = mean(data$x)
)
#> # A tibble: 3 × 1
#> x
#> <dbl>
#> 1 2
#> 2 3.33
#> 3 1.8
or alternatively
tibble(data = new_list) |>
rowwise() |>
summarise(
data |>
summarise(x = mean(x))
)
#> # A tibble: 3 × 1
#> x
#> <dbl>
#> 1 2
#> 2 3.33
#> 3 1.8
Upvotes: 2
Reputation: 2301
Good answer by Quinten. I usually prefer to follow the KISS principle. Here is a format that I find syntactically simpler:
len <- length(new_list)
sapply(1:len, function(z) mean(new_list[[z]][[1]]))
[1] 2.000000 3.333333 1.800000
Upvotes: 2
Reputation: 51914
You can also enframe
the list and do a mean
by group:
library(dplyr) #1.1.0 or higher
library(tibble)
enframe(new_list) %>%
unnest(value) %>%
summarise(x = mean(x), .by = name)
# name x
#1 1 2
#2 2 3.33
#3 3 1.8
Upvotes: 1
Reputation: 41225
You could calculate the colMeans
per column that has x with sapply
like this:
data.frame(x = sapply(new_list, \(x) colMeans(x[grepl('x', names(x))])))
#> x
#> 1 2.000000
#> 2 3.333333
#> 3 1.800000
@nicola suggested a better option like this (thanks!):
data.frame(x = sapply(new_list, \(x) mean(x$x)))
#> x
#> 1 2.000000
#> 2 3.333333
#> 3 1.800000
Created on 2023-02-06 with reprex v2.0.2
Upvotes: 5
Reputation: 886948
Using map
library(purrr)
library(dplyr)
map_dfr(new_list, ~ .x %>%
summarise(x = mean(x)))
x
1 2.000000
2 3.333333
3 1.800000
Upvotes: 2