Reputation: 2040
I have the following two data frames, dat1
and dat2
:
library(tidyverse)
dat1 <- tribble(
~"subj", ~"drive", ~"measure",
"A", 1, 1,
"A", 1, 2,
"A", 1, 3,
"A", 1, 4,
"A", 1, 5,
"A", 2, 1,
"A", 2, 2,
"A", 2, 3,
"A", 2, 4,
"A", 2, 5,
"B", 1, 1,
"B", 1, 2,
"B", 1, 3,
"B", 1, 4,
"B", 1, 5,
"B", 2, 1,
"B", 2, 2,
"B", 2, 3,
"B", 2, 4,
"B", 2, 5,
)
dat2 <- tribble(
~"subj", ~"drive", ~"measure",
"A", 1, 3,
"B", 2, 4
)
And I am trying to filter the records in dat1
based on the following conditions:
subj
and drive
columns of dat1
should match the subj
and drive
columns of dat2
, andmeasure
values in dat1
should fall within a range of the measure
values in dat2
.For this example, say the range is one unit apart. So, my result data frame will look like the following:
result <- tribble(
~"subj", ~"drive", ~"measure",
"A", 1, 2,
"A", 1, 3,
"A", 1, 4,
"B", 2, 3,
"B", 2, 4,
"B", 2, 5
)
I am aware of dplyr::semi_join()
but it doesn't let me filter based on a range. Any ideas how I can solve this problem? Tidyverse
based solutions will be great!
Upvotes: 2
Views: 85
Reputation: 28685
Edited to use native sqldf string substitution as mentioned by GG's comment instead of sprintf.
library(sqldf)
check_range <- 1
fn$sqldf('
select one.*
from dat1 one
join dat2 two
on one.subj = two.subj
and one.drive = two.drive
and one.measure - two.measure between -`check_range` and `check_range`
')
# subj drive measure
# 1 A 1 2
# 2 A 1 3
# 3 A 1 4
# 4 B 2 3
# 5 B 2 4
# 6 B 2 5
Upvotes: 3
Reputation: 42544
For the sake of completeness, here is also a solution which uses a non-equi join:
library(data.table)
range <- 1
idx <- setDT(dat1)[
setDT(dat2)[, .(subj, drive, lower = measure - range, upper = measure + range)],
on = .(subj, drive, measure >= lower, measure <= upper), which = TRUE]
dat1[idx]
subj drive measure 1: A 1 2 2: A 1 3 3: A 1 4 4: B 2 3 5: B 2 4 6: B 2 5
Upvotes: 1
Reputation: 887118
An option is to first do an inner_join
and then use between
library(dplyr)
inner_join(dat1, dat2, by = c('subj', 'drive')) %>%
group_by(subj, drive) %>%
filter(between(measure.x, first(measure.y)-1, first(measure.y) + 1)) %>%
select(measure = measure.x)
# A tibble: 6 x 3
# Groups: subj, drive [2]
# subj drive measure
# <chr> <dbl> <dbl>
#1 A 1 2
#2 A 1 3
#3 A 1 4
#4 B 2 3
#5 B 2 4
#6 B 2 5
or with data.table
library(data.table)
setDT(dat1)[setDT(dat2), .SD[between(measure, i.measure -1,
i.measure + 1)], on = .(subj, drive), by = .EACHI]
# subj drive measure
#1: A 1 2
#2: A 1 3
#3: A 1 4
#4: B 2 3
#5: B 2 4
#6: B 2 5
Upvotes: 3