Reputation:
I'm aware there are a lot of answered similar questions. For example: Filter data.frame rows by a logical condition . The problem is these answers don't work when the type of the column is a list.
In fact, I'm using the Yelp businesses dataset that I've loaded using the library jsonlite (flattening the result). One of the columns, the categories of the business, is a list of strings.
> typeof(business_df["categories"])
[1] "list"
> business_df[1:3, "categories"]
[[1]]
[1] "Shopping" "Shopping Centers"
[[2]]
[1] "Food" "Soul Food" "Convenience Stores" "Restaurants"
[[3]]
[1] "Food" "Coffee & Tea"
For now, I have this horrible solution:
filterByCategory <- function(category) {
filtered_df <- cbind(businesses_df)
if (category != "All") {
filtered_df[, "belongs"] <-
apply(filtered_df["categories"], 1, function(x)
is.element(category, x[[1]]))
filtered_df <<- subset(filtered_df, belongs)
}
}
As you can see, I need to access the column with the [[1]]
syntax. This is why I think none of these solutions actually work:
# All rows returned
business_df[category %in% business_df$categories]
subset(business_df, category %in% business_df$categories)
# No rows returned
business_df %>% filter(category %in% categories)
Upvotes: 3
Views: 3667
Reputation: 9087
It sounds like you are trying to filter a data frame where a list column contains a specific value.
categories
is a list of vectors. map_lgl
will map each element (vector) of the list into a logical
.
library('tidyverse')
df <- tribble(
~rownum, ~categories,
1, c('a', 'b'),
2, c('c', 'd'),
3, c('d', 'e')
)
# All rows containing the 'd' category
df %>%
filter(map_lgl(categories, ~'d' %in% .)) %>%
str
#> Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 2 variables:
#> $ rownum : num 2 3
#> $ categories:List of 2
#> ..$ : chr "c" "d"
#> ..$ : chr "d" "e"
Upvotes: 4