Shahin
Shahin

Reputation: 1316

A problem with dplyr (applying which.min to a list of dataframes)

There is this issue with using which() and which.min in map() and lapply where I only get one number as my output, however, there are multiple vector values that satisfy the condition. DATA

library(dplyr)
library(purrr)
foo <- dplyr::tibble(a=c("a","b",NA),b=c("a","b","c"),colC=c("a",NA,"c"))
bar <- dplyr::tibble(a=c("a","b",NA),b=c("a","b","c"),colC=c("a",NA,"c"))
all_tibbles <- c("foo","bar")
mget(all_tibbles)
$foo
# A tibble: 3 x 3
  a     b     colC 
  <chr> <chr> <chr>
1 a     a     a    
2 b     b     NA   
3 NA    c     c    

$bar
# A tibble: 3 x 3
  a     b     colC 
  <chr> <chr> <chr>
1 a     a     a    
2 b     b     NA   
3 NA    c     c
mget(all_tibbles) %>%
  map(~ rowSums(!is.na(.x)))
$foo
[1] 3 2 2

$bar
[1] 3 2 2
mget(all_tibbles) %>% map(~ rowSums(!is.na(.x))) %>% map(~ which.min(.x))
lapply(mget(all_tibbles) %>% map(~ rowSums(!is.na(.x)) ),which.min)
$foo
[1] 2

$bar
[1] 2

As you see there is definitely more than column two. I was expecting which.min to output 2 3

Upvotes: 1

Views: 61

Answers (1)

akrun
akrun

Reputation: 886938

If we want both entries, use == as which.min returns only the first occurence of the match

mget(all_tibbles) %>% 
     map(~ .x %>%
             mutate(new = rowSums(!is.na(.))) %>%
             filter(new == min(new)) %>% 
             select(-new))

Upvotes: 3

Related Questions