Reputation: 89
I expect to remove rows based on several condition and dates differences between different events, per ids.
my data look like this
#dat
id name date
10 "BREEDING" 2019-05-17
10 "OTHER" 2020-01-01
11 "BREEDING" 2020-07-01
11 "GESTATION POSITIF" 2020-09-01
12 "BREEDING" 2020-06-26
12 "GESTATION NEGATIF" 2020-08-01
21 "OTHER" 2018-06-20
21 "GESTATION POSITIF" 2018-10-15
22 "BREEDING" 2020-08-07
22 "GESTATION POSITIF" 2020-09-11
what do I wish is, per ids, for those who got name "BREEDING" and "GESTATION" (doesn't matter if it is positive or negative), to compute the date difference between the events "GESTATION" and the events "BREEDING".
Then if the date difference is 34, 35, 36, to remove the row BREEDING who match this condition, within the object dat.
I got a loop solution which is doing the job, but I'd like to improve this. I made several trials but did not succeed. Basically I assume that the right code should be something "like" this
dat %>%
group_by(id) %>%
grepl("GESTATION",name) %>% #keep ids with an event gestation recorded
grepl("BREEDING",name) %>% #keep ids with an event breeding recorded
mutate(new_var = "date GESTATION"-"date BREEDING") %>%#per ids, compute de difference of date beetween events gestation event and breeding
filter(!(new_var %in% c(34:36))) %>% #if an event of breeding happened 35 days before the gestation event, +/- 1 day, remove
ungroup()
finally my result should be, for this example,
#dat
id name date
10 "BREEDING" 2019-05-17
10 "OTHER" 2020-01-01
11 "BREEDING" 2020-07-01
11 "GESTATION POSITIF" 2020-09-01
12 "GESTATION NEGATIF" 2020-08-01
21 "OTHER" 2018-06-20
21 "GESTATION POSITIF" 2018-10-15
22 "GESTATION POSITIF" 2020-09-11
Upvotes: 2
Views: 197
Reputation: 674
This is also probably not the most clean solution but pivoting to wide format and then back to long works:
library(tidyverse)
library(lubridate)
dat %>%
separate(name, into = c("name", "gest"), fill = "right") %>%
pivot_wider(names_from = name, values_from = c(date, gest)) %>%
mutate(date_BREEDING = if_else((date_GESTATION - date_BREEDING) %in% c(34, 35, 36), NA_Date_, date_BREEDING)) %>%
pivot_longer(cols = c(date_BREEDING, date_OTHER, date_GESTATION), values_to = "date", values_drop_na = T) %>%
select(-gest_BREEDING, -gest_OTHER) %>%
mutate(name = str_sub(name, 6))
The output is:
id gest_GESTATION name date
<dbl> <chr> <chr> <date>
1 10 NA BREEDING 2019-05-17
2 10 NA OTHER 2020-01-01
3 11 POSITIF BREEDING 2020-07-01
4 11 POSITIF GESTATION 2020-09-01
5 12 NEGATIF GESTATION 2020-08-01
6 21 POSITIF OTHER 2018-06-20
7 21 POSITIF GESTATION 2018-10-15
8 22 POSITIF GESTATION 2020-09-11
Which has the additional advantage of saving whether "GESTATION" is positive or negative in a separate variable. If you do not need that and want exactly the desired output specified in your question you can add:
%>%
mutate(name = if_else(is.na(gest_GESTATION), name, str_c(name, gest_GESTATION, sep = " "))) %>%
select(-gest_GESTATION)
Upvotes: 1