Reputation: 350
I have two dataframes: df1 and df2 (see example below). df1 contains a numeric start and end value per character id. df2 contains multiple events per character id, including a numeric time value.
library(dplyr)
df1 <- data_frame(id = c("A", "B"),
start = c(2, 3),
end = c(5, 9))
df2 <- data_frame(id = c(rep("A", 4), rep("B", 4)),
time = c(1, 3, 5, 8, 2, 3, 4, 10),
keep = c(FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE))
I am trying to filter events in df2 using dplyr based on the time value being equal to or in between the start and end values per id in df1. So the filter is "df2$time >= df1$start & df2$time <= df1$end" which should be executed for each id separatly. I've added a logical column "keep" to show which rows I want to keep in df2.
How can I execute this filter for each id? Preferably using dplyr. This should be the endresult:
df2 %>%
filter(keep == TRUE)
Any help is much appreciated!
Upvotes: 0
Views: 1470
Reputation: 83275
Using the non-equi join capability of data.table:
# load the package and convert the dataframes to data.tables
library(data.table)
setDT(df1)
setDT(df2)
# non-equi join
df2[df1
, on = .(id, time >= start, time <= end)
, .(id, time = x.time)]
which gives:
id time 1: A 3 2: A 5 3: B 3 4: B 4
What this does:
setDT()
converts a data.frame to a data.tabledf1
with df2
. by using on = .(id, time >= start, time <= end)
you join on the exact match of id
while at the same time time
has to higher or equal to start
from df1
and lower or equal to end
from df1
.time
-columns from df1
(start
and end
) as time
and time
for the cases when the join-conditions are met (see also here). To get an idea of what I mean, you can just do df2[df1, on = .(id, time >= start, time <= end)]
..(id, time = x.time)
you get the desired columns back. x.time
refers to the time
-column from the x-data.table which is df2
.Upvotes: 2
Reputation: 5958
This also works
df2$start <- df1[match(df2$id, df1$id),"start"]
df2$end <- df1[match(df2$id, df1$id),"end"]
df2$keep <- df2$time>df2$start& df2$time<df2$end
result <- df2 %>% filter(keep)
result
Upvotes: 0
Reputation: 9525
What about something like this, with dplyr
:
df1 %>%
left_join(df2) %>% #joining to have one dataset
filter(time <= end, time >= start) %>% # filter, you can use <, > in case
select(-c(2,3)) # remove useless column if necessary
# A tibble: 4 x 3
id time keep
<chr> <dbl> <lgl>
1 A 3 TRUE
2 A 5 TRUE
3 B 3 TRUE
4 B 4 TRUE
Upvotes: 4