Reputation: 1315
I'm trying to do a rather complicated filter using if else
and I keep getting turned around in circles.
Here is a sample of the data:
individual_id born mom_id trap_date
13212 2013-03-16 11926 2014-06-26
13213 2013-03-16 11926 2013-07-23
13214 2013-03-16 11926 2015-06-01
13212 2013-03-16 11926 2014-06-26
11926 2010-04-08 10422 2013-01-23
11926 2010-04-08 10422 2013-02-01
11926 2010-04-08 10422 2013-03-08
11926 2010-04-08 10422 2013-03-12
11926 2010-04-08 10422 2013-04-23
11926 2010-04-08 10422 2013-05-01
11926 2010-04-08 10422 2013-08-23
11926 2010-04-08 10422 2013-09-01
11926 2010-04-08 10422 2013-10-23
11926 2010-04-08 10422 2013-12-01
I have a column called mom_id
and these individuals also show up in the individual_id
column.
I'd like to remove any trap_date
records that occur - 35 days or + 70 days from the born
date for a mom_id
, only if she also shows up in the individual_id
column.
So, in this example, I'd like to remove all trap_date
records/rows that are between February 9, 2013 (2013-02-09
) and May 25, 2013 (2013-05-25
) because the born
date for mom_id
11926 is March 16, 2013 (2013-03-16
).
Desired result:
individual_id born mom_id trap_date
13212 2013-03-16 11926 2014-06-26
13213 2013-03-16 11926 2013-07-23
13214 2013-03-16 11926 2015-06-01
13212 2013-03-16 11926 2014-06-26
11926 2010-04-08 10422 2013-01-23
11926 2010-04-08 10422 2013-02-01
11926 2010-04-08 10422 2013-08-23
11926 2010-04-08 10422 2013-09-01
11926 2010-04-08 10422 2013-10-23
11926 2010-04-08 10422 2013-12-01
A dplyr
solution would be appreciated, but I'd appreciate any help at this point!
Upvotes: 2
Views: 352
Reputation: 42592
If I understand correctly, this can be solved using a non-equi anti-join:
library(data.table)
df2 <- unique(setDT(df1)[, .(mom_id, lo = born - 35, hi = born + 70)])
df1[!df1[df2, on = c("individual_id == mom_id", "trap_date >= lo", "trap_date <= hi"),
nomatch = NULL, which = TRUE]]
individual_id born mom_id trap_date 1: 19262 2013-03-16 11926 2014-06-26 2: 19263 2013-03-16 11926 2013-07-23 3: 19264 2013-03-16 11926 2015-06-01 4: 19262 2013-03-16 11926 2014-06-26 5: 11926 2010-04-08 10422 2013-01-23 6: 11926 2010-04-08 10422 2013-02-01 7: 11926 2010-04-08 10422 2013-08-23 8: 11926 2010-04-08 10422 2013-09-01 9: 11926 2010-04-08 10422 2013-10-23 10: 11926 2010-04-08 10422 2013-12-01
For each unique mom_id
the range lo
to hi
of dates to be removed is computed. So, df2
becomes
mom_id lo hi 1: 11926 2013-02-09 2013-05-25 2: 10422 2010-03-04 2010-06-17
Now, the row numbers of the rows to be removed are identified by a non-equi join
df1[df2, on = c("individual_id == mom_id", "trap_date >= lo", "trap_date <= hi"),
nomatch = NULL, which = TRUE]
which returns
[1] 7 8 9 10
Finally, these rows are removed from df1
by subsetting (using !
)
library(data.table)
df1 <- fread("individual_id born mom_id trap_date
19262 2013-03-16 11926 2014-06-26
19263 2013-03-16 11926 2013-07-23
19264 2013-03-16 11926 2015-06-01
19262 2013-03-16 11926 2014-06-26
11926 2010-04-08 10422 2013-01-23
11926 2010-04-08 10422 2013-02-01
11926 2010-04-08 10422 2013-03-08
11926 2010-04-08 10422 2013-03-12
11926 2010-04-08 10422 2013-04-23
11926 2010-04-08 10422 2013-05-01
11926 2010-04-08 10422 2013-08-23
11926 2010-04-08 10422 2013-09-01
11926 2010-04-08 10422 2013-10-23
11926 2010-04-08 10422 2013-12-01")
Upvotes: 1
Reputation: 444
First we convert both born
and trap_date
to dates, in case they're only text. Then we check the mom_id
. Finally, we check the trap_date
.
df %>%
mutate(
across(c(born, trap_date), lubridate::ymd), # applies ymd parser over both columns
in_individual = mom_id %in% individual_id,
within_interval =
trap_date %within%
interval(
start = born - 35,
end = .born + 70)) %>%
filter(in_individual, within_interval) # filter out conditional on both
df %>%
mutate(
across(c(born, trap_date), lubridate::ymd), # applies ymd parser over both columns
in_individual = mom_id %in% individual_id,
within_interval = purrr::map2_lgl(
.x = trap_date,
.y = born,
.f = ~ .x %within%
interval(
start = .y - 35,
end = .y + 70)) %>%
filter(in_individual, within_interval) # filter out conditional on both
Upvotes: 0
Reputation: 399
I think this does what you're looking for.
mutate(df, in_individual = mom_id %in% individual_id) %>%
filter((!in_individual) | ((born - 35 <= trap_date) & (trap_date <= born + 70)))
Upvotes: 2