Reputation: 26323
Using R 3.1.2
, dplyr 0.4.0
.
I'm trying to use a filter
within a filter
, which sounds very simple and I don't understand why it doesn't give me the result I expect. This is code I wrote about 6 months ago and I'm fairly certain it worked, so either it stopped working because of an updated R version or dplyr
or some other dependency. Anyway, here is some simple code that filters rows from df1 based on a condition that is found with a filter
on a column in df2.
df1 <- data.frame(x = c("A", "B"), stringsAsFactors = FALSE)
df2 <- data.frame(x = "A", y = TRUE, stringsAsFactors = FALSE)
dplyr::filter(df1, x %in% (dplyr::filter(df2, y)$x))
I expect this to show the first row of df1
, but instead I get
# [1] x
# <0 rows> (or 0-length row.names)
which I'm not sure what to make of. Why is it returning a vector AND an empty data.frame?
If I break up the filter code into two separate statements, I get what I expect
xval <- dplyr::filter(df2, y)$x
dplyr::filter(df1, x %in% xval)
# x
# 1 A
Can anyone help me figure out why this behaviour is happening? I'm not saying it's a bug, but I don't understand it.
Upvotes: 4
Views: 1876
Reputation: 70266
It's a valid question, why your approach doesn't work (any more, apparently). I can't answer that but I would suggest a different approach, as commented above, which avoids nested function calls (filter
inside another filter
) which, IMO, is what dplyr is made for: being expressive by easy to read and understand syntax, from left to right, top to bottom.
So for your example, because the columns you are interested in are both named "x" you can do:
filter(df2, y) %>% select(x) %>% inner_join(df1)
And if they were different, for example "z" and "x" you could use:
filter(df2, y) %>% select(x) %>% inner_join(df1, by = c("z" = "x"))
As noted by Hadley in his comment below, it would be safer to use a semi_join
instead of inner_join
here. The documentation says:
semi_join: return all rows from x where there are matching values in y, keeping just columns from x.
A semi join differs from an inner join because an inner join will return one row of x for each matching row of y, where a semi join will never duplicate rows of x.
Hence, you could do for the example case:
filter(df2, y) %>% select(x) %>% semi_join(df1)
Upvotes: 4