Reputation: 281
I have a list of dataframes for which I want to obtain (in a separate dataframe) the row mean of a specified column which may or may not exist in all dataframes of the list. My problem comes when the specified column does not exist in at least one of the dataframes of the list.
Assume the following example list of dataframes:
df1 <- read.table(text = 'X A B C
name1 1 2 3
name2 5 10 4',
header = TRUE)
df2 <- read.table(text = 'X B C A
name1 8 1 31
name2 9 9 8',
header = TRUE)
df3 <- read.table(text = 'X B A E
name1 9 9 29
name2 5 15 55',
header = TRUE)
mylist_old <-list(df1, df2)
mylist_new <-list(df1, df2, df3)
Assume I want to rowMeans
column C
the following piece of code works perfectly when the list of dataframe (mylist_old
) is composed of elements df1
and df2
, :
Mean_C <- rowMeans(do.call(cbind, lapply(mylist_old, "[", "C")))
Mean_C <- as.data.frame(Mean_C)
The trouble comes when the list is composed of at least one dataframe for which column C
does not exist, which in my example is the case of df3
, that is for list mylist_new
:
Mean_C <- rowMeans(do.call(cbind, lapply(mylist_new, "[", "C")))
Leads to: "Error in [.data.frame
(X[[i]], ...) : undefined columns selected
One way to circumvent this issue is to exclude df3
from mylist_new
. However, my real program has a list of 64 dataframes for which I do not know whether column C
exists or not. I would have like to lapply
my piece of code only if column C
is detected as existing, that is applying the command to the list of dataframes but only for dataframes for which existence of column C
is true.
I tried this
if("C" %in% colnames(mylist_new))
{
Mean_C <- rowMeans(do.call(cbind, lapply(mylist_new, "[", "C")))
Mean_C <- as.data.frame(Mean_C)
}
But nothing happens, probably because colnames
refers to the list and not to each dataframe of the list. With 64 dataframes, I cannot refer to each "manually" and need an automated procedure.
Upvotes: 5
Views: 732
Reputation: 47300
One way is to use purrr::safely
, it will return for each iteration a list with a result
and error
element, then we can transpose, extract result
and remove the NULL
result with compact
:
library(tidyverse)
rowMeans(do.call(cbind, transpose(
lapply(mylist_new, safely(`[`), "C"))$result %>% compact()))
# [1] 2.0 6.5
We could also use the otherwise
parameter to have a NA
result rather than NULL
, and we can set na.rm
to TRUE
in rowMeans
.
rowMeans(na.rm = TRUE, do.call(cbind, transpose(
lapply(mylist_new, safely(`[`, otherwise= NA), "C"))$result))
# [1] 2.0 6.5
This was to address your case with minimal modifications. If I have to solve this precise issue I would do it the following way :
map(mylist_new, "C") %>% compact() %>% pmap_dbl(~mean(c(...)))
# [1] 2.0 6.5
We extract the C
element, remove it when it's NULL
, and then compute mean by element.
This might be more efficient (not sure):
map(set_names(mylist_new), "C") %>% compact() %>% as_tibble() %>% rowMeans()
# [1] 2.0 6.5
One more, using reshaping this time :
map_dfr(mylist_new, ~gather(.,,,-1)) %>%
group_by(X) %>%
filter(key == "C") %>%
summarize_at("value", mean)
# # A tibble: 2 x 2
# X value
# <fct> <dbl>
# 1 name1 2
# 2 name2 6.5
And a base version, quite readable, with a somewhat awkward step where several columns have the same name, but it's on a temp object so that's not that bad:
wide <- do.call(cbind, mylist_new)
rowMeans(wide[names(wide) == "C"])
# [1] 2.0 6.5
Upvotes: 0
Reputation: 886938
Here is one option to Filter
the list
elements and then apply the lapply
on the filtered list
rowMeans(do.call(cbind, lapply(Filter(function(x) "C" %in% names(x),
mylist_new), `[[`, "C")))
#[1] 2.0 6.5
or using tidyverse
without Filter
ing, but making use of select
to ignore the cases where the column is not present
library(tidyverse)
map(mylist_new, ~ .x %>%
select(one_of("C"))) %>% # gives a warning
bind_cols %>%
rowMeans
#[1] 2.0 6.5
It may be better to have some warning that the column is not present
Or without a warning
map(mylist_new, ~ .x %>%
select(matches("^C$"))) %>%
bind_cols %>%
rowMeans
#[1] 2.0 6.5
Upvotes: 6
Reputation: 13125
We can use if to check names before we do the subset
rowMeans(do.call(cbind,
lapply(mylist_new, function(x) if('C' %in% names(x)) x['C'] else NA)),na.rm = TRUE)
Or using map_if in purrr 0.3.2
library(purrr)
rowMeans(do.call(cbind,map_if(mylist_new,
function(x) 'C' %in% names(x),
'C', .else=~return(NA))),na.rm = TRUE)
[1] 2.0 6.5
Upvotes: 3