Reputation: 11
this question is about Tibbles in tidyverse package of R. I have created the below example to represent my data .
Tibble 'ab' is a list of people (column a) and a date that something specific happened (column b) e.g. received a vaccine.
Tibble 'cd' is a list of the same people (column c) but includes all event dates recorded for that person (column d).
I would like a table similar to Tibble 'cd' but only where the date (column d) is earlier than the date for each person in Tibble 'ab' (column b)(e.g. date receiving a vaccine). In other words, excluding all events that happen after the vaccine date in Tibble 'ab', for each person.
a <- 1001:1007
b <- c ("20/03/1998", "04/05/2012", "06/07/1989", "14/08/1990", "30/01/1978", "09/09/2011","23/11/2005")
ab <- tibble (a,b)
ab_date <- ab %>% mutate (b = as.Date (b, format = "%d/%m/%Y"))
c <- c( 1001, 1001, 1001, 1001, 1002, 1002, 1002)
d <- c( "12/02/1996", "14/05/1997", "20/03/1998", "23/03/1998", "10/10/1992", "04/05/2012", "13/11/2020")
cd <- tibble (c,d)
cd_date <- cd %>% mutate (d = as.Date (d, format = "%d/%m/%Y"))
For this example it would look like this:
e <- c(1001, 1001, 1002)
f <- c( "12/02/1996","14/05/1997", "10/10/1992")
ef <- tibble (e,f)
ef_date <- ef %>% mutate (f = as.Date (f, format = "%d/%m/%Y"))
I have tried a few things with inner_join and fuzzy_join but am not really sure how to apply these for what I am trying to do!
Any help would be greatly appreciated!
Thanks
Luke
Upvotes: 1
Views: 405
Reputation: 30474
With fuzzyjoin
you can try the following. Upon joining, you would want columns a
and c
to be equal, and keep dates where b
comes after d
. Using an inner join, you will only keep matching rows in both data frames.
library(tidyverse)
library(fuzzyjoin)
ab_date %>%
fuzzy_inner_join(cd_date,
by = c("a" = "c", "b" = "d"),
match_fun = c(`==`, `>`)) %>%
select(c, d)
Output
c d
<dbl> <date>
1 1001 1996-02-12
2 1001 1997-05-14
3 1002 1992-10-10
Upvotes: 1