Filter based on data in a nested data frame column using purrr

Question

I'm trying to filter the rows of a data frame based on data inside a nested data frame column. Consider the following example:

library(tidyverse)

df  <- structure(list(id = c(47L, 47L, 45L, 45L, 85L, 85L), src = c("bycity", 
         "indb", "bycity", "indb", "bycity", "indb"), lat = c(42.73856678, 
         NA, 39.40803248, 39.40620766, 42.52458775, NA), lon = c(-85.82890251, 
         -85.654987, -88.47774221, -88.50701219, -87.26410992, -83.647894)), .Names = c("id", 
          "src", "lat", "lon"), row.names = c(NA, -6L), class = c("tbl_df", 
         "tbl", "data.frame")
    ) %>% 
  nest(-id) %>% 
  mutate(
    anothervar = c(0.077537764, NA, 0.029326812)
  )


# only keep the rows where the lat in the indb row is NA
filtereddf  <- df %>% 
   filter(map(data, ~(.x %>% pluck("lat", 2) %>% is.na )) )

# Error in filter_impl(.data, quo) : 
#   Argument 2 filter condition does not evaluate to a logical vector


# desired output would be the two rows where data[[2,2]] is NA
# A tibble: 2 x 3
     id             data anothervar
                   
1    47  0.07753776
3    85  0.02932681

The nested data frames I'm filtering on have consistent column names and I always want to ONLY look at the 2nd row.

I suppose I could unnest the data frame (giving me two rows per ID, where I previously on had one), then filter things down to a list of IDs that meet my criteria and use an anti_join() to throw out the offending rows, but I'm more interested in learning why using map() in a filter isn't working the way I expect it to.

Why am I receiving this error and how can I filter on a nested data frame column?

Lucy · Accepted Answer

You want to use map_lgl(), map() will return a list, whereas map_lgl() returns a vector of type logical.

filtereddf  <- df %>% 
   filter(map_lgl(data, ~(.x %>% pluck("lat", 2) %>% is.na )) )

Filter based on data in a nested data frame column using purrr

Answers (1)

Related Questions