Blundering Ecologist
Blundering Ecologist

Reputation: 1315

If else statement to filter out rows using dates and matching values across multiple columns in R

I'm trying to do a rather complicated filter using if else and I keep getting turned around in circles.

Here is a sample of the data:

individual_id born        mom_id trap_date 
        13212 2013-03-16  11926  2014-06-26
        13213 2013-03-16  11926  2013-07-23
        13214 2013-03-16  11926  2015-06-01
        13212 2013-03-16  11926  2014-06-26
        11926 2010-04-08  10422  2013-01-23
        11926 2010-04-08  10422  2013-02-01
        11926 2010-04-08  10422  2013-03-08
        11926 2010-04-08  10422  2013-03-12
        11926 2010-04-08  10422  2013-04-23
        11926 2010-04-08  10422  2013-05-01
        11926 2010-04-08  10422  2013-08-23
        11926 2010-04-08  10422  2013-09-01
        11926 2010-04-08  10422  2013-10-23
        11926 2010-04-08  10422  2013-12-01

I have a column called mom_id and these individuals also show up in the individual_id column.

I'd like to remove any trap_date records that occur - 35 days or + 70 days from the born date for a mom_id, only if she also shows up in the individual_id column.

So, in this example, I'd like to remove all trap_date records/rows that are between February 9, 2013 (2013-02-09) and May 25, 2013 (2013-05-25) because the born date for mom_id 11926 is March 16, 2013 (2013-03-16).

Desired result:

individual_id born        mom_id trap_date 
        13212 2013-03-16  11926  2014-06-26
        13213 2013-03-16  11926  2013-07-23
        13214 2013-03-16  11926  2015-06-01
        13212 2013-03-16  11926  2014-06-26
        11926 2010-04-08  10422  2013-01-23
        11926 2010-04-08  10422  2013-02-01
        11926 2010-04-08  10422  2013-08-23
        11926 2010-04-08  10422  2013-09-01
        11926 2010-04-08  10422  2013-10-23
        11926 2010-04-08  10422  2013-12-01

A dplyr solution would be appreciated, but I'd appreciate any help at this point!

Upvotes: 2

Views: 352

Answers (3)

Uwe
Uwe

Reputation: 42592

If I understand correctly, this can be solved using a non-equi anti-join:

library(data.table)
df2 <- unique(setDT(df1)[, .(mom_id, lo = born - 35, hi = born + 70)])
df1[!df1[df2, on = c("individual_id == mom_id", "trap_date >= lo", "trap_date <= hi"),
    nomatch = NULL, which = TRUE]]
    individual_id       born mom_id  trap_date
 1:         19262 2013-03-16  11926 2014-06-26
 2:         19263 2013-03-16  11926 2013-07-23
 3:         19264 2013-03-16  11926 2015-06-01
 4:         19262 2013-03-16  11926 2014-06-26
 5:         11926 2010-04-08  10422 2013-01-23
 6:         11926 2010-04-08  10422 2013-02-01
 7:         11926 2010-04-08  10422 2013-08-23
 8:         11926 2010-04-08  10422 2013-09-01
 9:         11926 2010-04-08  10422 2013-10-23
10:         11926 2010-04-08  10422 2013-12-01

Explanation

  • For each unique mom_id the range lo to hiof dates to be removed is computed. So, df2 becomes

       mom_id         lo         hi
    1:  11926 2013-02-09 2013-05-25
    2:  10422 2010-03-04 2010-06-17
    
  • Now, the row numbers of the rows to be removed are identified by a non-equi join

     df1[df2, on = c("individual_id == mom_id", "trap_date >= lo", "trap_date <= hi"),
         nomatch = NULL, which = TRUE]  
    

    which returns

    [1] 7 8 9 10

  • Finally, these rows are removed from df1 by subsetting (using !)

Data

library(data.table)
df1 <- fread("individual_id born        mom_id trap_date 
        19262 2013-03-16  11926  2014-06-26
        19263 2013-03-16  11926  2013-07-23
        19264 2013-03-16  11926  2015-06-01
        19262 2013-03-16  11926  2014-06-26
        11926 2010-04-08  10422  2013-01-23
        11926 2010-04-08  10422  2013-02-01
        11926 2010-04-08  10422  2013-03-08
        11926 2010-04-08  10422  2013-03-12
        11926 2010-04-08  10422  2013-04-23
        11926 2010-04-08  10422  2013-05-01
        11926 2010-04-08  10422  2013-08-23
        11926 2010-04-08  10422  2013-09-01
        11926 2010-04-08  10422  2013-10-23
        11926 2010-04-08  10422  2013-12-01")

Upvotes: 1

Pedro Cavalcante
Pedro Cavalcante

Reputation: 444

First we convert both born and trap_date to dates, in case they're only text. Then we check the mom_id. Finally, we check the trap_date.

df %>%
  mutate(
    across(c(born, trap_date), lubridate::ymd), # applies ymd parser over both columns
    in_individual = mom_id %in% individual_id,
    within_interval = 
      trap_date %within% 
        interval(
          start = born - 35,
          end = .born + 70)) %>%
    filter(in_individual, within_interval) # filter out conditional on both

Edit

df %>%
  mutate(
    across(c(born, trap_date), lubridate::ymd), # applies ymd parser over both columns
    in_individual = mom_id %in% individual_id,
    within_interval = purrr::map2_lgl(
      .x = trap_date,
      .y = born,
      .f = ~ .x %within%
        interval(
          start = .y - 35,
          end = .y + 70)) %>%
    filter(in_individual, within_interval) # filter out conditional on both

Upvotes: 0

Akindele Davies
Akindele Davies

Reputation: 399

I think this does what you're looking for.

mutate(df, in_individual = mom_id %in% individual_id) %>%
    filter((!in_individual) | ((born - 35 <= trap_date) & (trap_date <= born + 70)))

Upvotes: 2

Related Questions