Reputation: 3092
I'm trying to filter the rows of a data frame based on data inside a nested data frame column. Consider the following example:
library(tidyverse)
df <- structure(list(id = c(47L, 47L, 45L, 45L, 85L, 85L), src = c("bycity",
"indb", "bycity", "indb", "bycity", "indb"), lat = c(42.73856678,
NA, 39.40803248, 39.40620766, 42.52458775, NA), lon = c(-85.82890251,
-85.654987, -88.47774221, -88.50701219, -87.26410992, -83.647894)), .Names = c("id",
"src", "lat", "lon"), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame")
) %>%
nest(-id) %>%
mutate(
anothervar = c(0.077537764, NA, 0.029326812)
)
# only keep the rows where the lat in the indb row is NA
filtereddf <- df %>%
filter(map(data, ~(.x %>% pluck("lat", 2) %>% is.na )) )
# Error in filter_impl(.data, quo) :
# Argument 2 filter condition does not evaluate to a logical vector
# desired output would be the two rows where data[[2,2]] is NA
# A tibble: 2 x 3
id data anothervar
<int> <list> <dbl>
1 47 <tibble [2 x 3]> 0.07753776
3 85 <tibble [2 x 3]> 0.02932681
The nested data frames I'm filtering on have consistent column names and I always want to ONLY look at the 2nd row.
I suppose I could unnest the data frame (giving me two rows per ID, where I previously on had one), then filter things down to a list of IDs that meet my criteria and use an anti_join()
to throw out the offending rows, but I'm more interested in learning why using map()
in a filter isn't working the way I expect it to.
Why am I receiving this error and how can I filter on a nested data frame column?
Upvotes: 3
Views: 1822
Reputation: 991
You want to use map_lgl()
, map()
will return a list, whereas map_lgl()
returns a vector of type logical.
filtereddf <- df %>%
filter(map_lgl(data, ~(.x %>% pluck("lat", 2) %>% is.na )) )
Upvotes: 11