Reputation: 5
I have two dataframes:
x <- data.frame('a'=c(1,1,2,2,3,3),'b'=c(1,2,1,2,1,2),'c'=c(1,2,3,2,1,3))
y <- data.frame('a'=c(1,2,3),'b'=c(1,1,1),'c'=c(1,2,3))
would like to extract rows from x as follow conditions:
x$a==y$a & x$b==y$b+1 & (if y$c == 1, then x$c %in% c(2,3); if ...)
result should like: x: a b c 1 2 2 only row 2 in x match. I have tried:
z <- y[which(y$a == x$a & y$b == x$b+1),]
to match at lease the first two conditions, but not work, the result is null. What is the suitable way to deal with it? Please help.
Upvotes: 0
Views: 371
Reputation: 42544
Perhaps, you can try to run an SQL statement with help of sqldf
:
sqldf("select x.* from x join y where
x.a = y.a and
x.b = y.b + 1 and
y.c = 1 and
x.c in (2, 3)")
a b c 1 1 2 2
This returns the expected result for the given sample dataset. However, it needs to be tested with a production dataset.
The third and fourth condition are "fixed", e.g., y$c == 1
and x$c %in% c(2, 3)
. So, x
and y
can be filtered before joining:
sqldf("select u.* from
(select * from x where c in (2, 3)) as u
join
(select * from y where c = 1) as v
where u.a = v.a")
a b c 1 1 2 2
Perhaps, this might result in a performance gain (untested!)
This approach can also be used in dplyr
:
library(dplyr)
x %>%
filter(c %in% 2:3) %>%
inner_join(
y %>%
filter(c == 1) %>%
mutate(bp1 = b + 1),
by = c("a", "b" = "bp1"))
a b c.x b.y c.y 1 1 2 2 1 1
Upvotes: 1
Reputation: 160437
I don't know that a simple "merge" is due here (due to inequality), but here's an attempt:
library(dplyr)
y %>%
mutate(bp1 = b + 1) %>%
inner_join(x, ., by = c("a", "b" = "bp1")) %>%
filter(c.y == 1 & c.x %in% c(2, 3))
# a b c.x b.y c.y
# 1 1 2 2 1 1
Note: the .x
and .y
are coincidentally the same as the frame from which they came, but that is just coincidence.
Upvotes: 0