Reputation: 335
I am trying to compute the column means only for the numeric columns in my nested list and return my nested list with the same structure including non-numeric columns.
Here is my example list:
#Create dataframes ABC and DEF, where 'left' and 'right' columns are numeric and the 'up' column contains characters
ABC <- cbind.data.frame(left = c(2, 3, 5), right = c(5, 8, 4) , up = c("aa","aa", "aa"))
ABC$up <- as.character(ABC$up)
DEF <- cbind.data.frame(left = c(7, 2, 9), right = c(3, 6, 1) , up = c("bb","bb", "bb"))
DEF$up <- as.character(DEF$up)
#Create a list called mylist, containing two dataframes: ABC and DEF
mylist <- list(ABC = ABC, DEF = DEF)
$ABC
# left right up
#1 2 5 aa
#2 3 8 aa
#3 5 4 aa
$DEF
# left right up
#1 7 3 bb
#2 2 6 bb
#3 9 1 bb
I would like mylist of column means to look like this:
perfect.col.means.list
$ABC
# left right up
#1 3.33 5.66 aa
$DEF
# left right up
#1 6 3.33 bb
I have tried:
means.by.col <- lapply(mylist, function(x) {
as.data.frame(lapply(x, function(y) if(is.numeric(y)) colMeans(y) else y))})
And this returns an error: Error in colMeans(y) : 'x' must be an array of at least two dimensions
I have also tried:
means.by.col <- lapply(mylist, function(x) {
as.data.frame(rapply(x, classes = "numeric", f = colMeans, how = "replace"))})
And this returns a similar error: Error in (function (x, na.rm = FALSE, dims = 1L): 'x' must be an array of at least two dimensions
I have tried just calculating the mean, which should work, but it replaces all values in the column with the mean rather than returning a single column mean value.
means.only <- lapply(mylist, function(x) {
as.data.frame(lapply(x, function(y) if(is.numeric(y)) mean(y) else y))})
> means.only
$ABC
left right up
1 3.333333 5.666667 aa
2 3.333333 5.666667 aa
3 3.333333 5.666667 aa
$DEF
left right up
1 6 3.333333 bb
2 6 3.333333 bb
3 6 3.333333 bb
Any suggestions?
Upvotes: 1
Views: 376
Reputation: 887213
We can loop over the list
with map
, do a group by 'up' and summarise
across
the numeric
columns
library(dplyr)
library(purrr)
map(mylist, ~ .x %>%
group_by(up) %>%
summarise(across(where(is.numeric), mean), .groups = 'drop'))
-output
#$ABC
# A tibble: 1 x 3
# up left right
# <chr> <dbl> <dbl>
#1 aa 3.33 5.67
#$DEF
# A tibble: 1 x 3
# up left right
# <chr> <dbl> <dbl>
#1 bb 6 3.33
Or with base R
lapply(mylist, function(x) data.frame(as.list(colMeans(x[1:2])), up = x$up[1]))
In the OP's code, we could get the first value of 'up'
lapply(mylist, function(x) data.frame(lapply(x,
function(y) if(is.numeric(y)) mean(y) else y[1])))
#$ABC
# left right up
#1 3.333333 5.666667 aa
#$DEF
# left right up
#1 6 3.333333 bb
Upvotes: 1