Reputation: 693
I'm trying to remove the factors from my dataframe, but only those after a specific date. Here I made a toy example:
I have a test
dataframe, and an inspection dataframe inspec
. I would like to remove the letters that are in var1
that appear in inspec
, but only the rows after the date in inspec
. For example, consider
> test = data.frame(var1 = c("A", "B", "A", "B", "C","B", "A"), measure = c(6,7,8,6,10,1,0), date = as.Date(c("2021-01-02", "2021-01-03", "2021-01-04", "2021-01-05", "2021-01-06", "2021-01-07", "2021-01-12")))
> test
var1 measure date
1 A 6 2021-01-02
2 B 7 2021-01-03
3 A 8 2021-01-04
4 B 6 2021-01-05
5 C 10 2021-01-06
6 B 1 2021-01-07
7 A 0 2021-01-12
>
> inspec = data.frame(var1 = c("A", "C", "D", "A"), date = as.Date(c("2021-01-03", "2021-01-06", "2021-01-10", "2021-01-12")))
> inspec
var1 date
1 A 2021-01-03
2 C 2021-01-06
3 D 2021-01-10
4 A 2021-01-12
Then, as result, I'd like to obtain:
> test
var1 measure date
1 A 6 2021-01-02
2 B 7 2021-01-03
3 B 6 2021-01-05
4 B 1 2021-01-07
Note that only the A
in var1
that were inspected after the date indicated in the inspec
dataframe were excluded. If I didnt' want to maintain the var1
before a inspec
date, I could just use test= test[!(test$var1 %in% inspec$var1),]
Any hint on how can I do that?
Upvotes: 1
Views: 60
Reputation: 28825
We can join on var1
and filter based on dates
while the data is grouped by var1
and only keep the first match. See below;
library(dplyr)
test %>%
left_join(inspec, by = "var1", suffix = c("", ".y")) %>%
group_by(var1) %>%
filter(is.na(date.y) | date < first(date.y)) %>%
select(-date.y) %>%
group_by_all() %>%
slice(1)
#> # A tibble: 4 x 3
#> # Groups: var1, measure, date [4]
#> var1 measure date
#> <fct> <dbl> <date>
#> 1 A 6 2021-01-02
#> 2 B 1 2021-01-07
#> 3 B 6 2021-01-05
#> 4 B 7 2021-01-03
This is a variation of r2evans's answer.
Upvotes: 2
Reputation: 160437
## reduce `inspec` to the earliest date
inspec$date <- as.Date(inspec$date)
tmpinspec <- inspec[ave(as.integer(inspec$date), inspec$var1, FUN = function(z) z == min(z)) > 0,]
tmpinspec
# var1 date
# 1 A 2021-01-03
# 2 C 2021-01-06
# 3 D 2021-01-10
tmp <- merge(test, tmpinspec, by = "var1", all.x = TRUE, suffixes = c("", ".y"))
tmp
# var1 measure date date.y
# 1 A 6 2021-01-02 2021-01-03
# 2 A 8 2021-01-04 2021-01-03
# 3 A 0 2021-01-12 2021-01-03
# 4 B 7 2021-01-03 <NA>
# 5 B 6 2021-01-05 <NA>
# 6 B 1 2021-01-07 <NA>
# 7 C 10 2021-01-06 2021-01-06
tmp <- tmp[with(tmp, is.na(date.y) | date < date.y),]
# tmp$date.y <- NULL
# tmp
var1 measure date
# 1 A 6 2021-01-02
# 4 B 7 2021-01-03
# 5 B 6 2021-01-05
# 6 B 1 2021-01-07
library(dplyr)
group_by(inspec, var1) %>%
slice_min(date) %>%
left_join(test, ., by = "var1", suffix = c("", ".y")) %>%
filter(is.na(date.y) | date < date.y) %>%
select(-date.y)
# var1 measure date
# 1 A 6 2021-01-02
# 2 B 7 2021-01-03
# 3 B 6 2021-01-05
# 4 B 1 2021-01-07
Upvotes: 2
Reputation: 365
Using data.table, we can also use the merge and then filter:
library(data.table)
test <- setDT(test); inspec <- setDT(inspec)
test <- merge(test, inspec, by = "var1", all.x = TRUE, suffixes = c("", ".y"))
test <- test[date < date.y | !is.na(date.y), .(var1, measure, date)]
Upvotes: 0