Reputation: 13
I would like to filter my dataset Based on the following conditions:
Just for Disease = ABC, and Class = Colonized and the First_Name and Last_Name match, keep the row with the the earliest Spec_Date only.
Keep all other rows with all other diseases no matter what Class they have and all rows with ABC disease and clinical Class.
Disease<-c("ABC","ABC","CRE","MCA","ABC","ABC","CRE","MCA") Class<-c("Colonized","Colonized","Clinical","Clinical","Colonized","Clinical","Clinical","Clinical") First_Name<-c("Roger","Roger","John","John","Roger","Mary","James","Lee") Last_Name<-c("Smith","Smith","Doe","Doe","Smith","Poppins","Bond","Majors") Spec_Date<-as.Date(c("2001-01-01","2002-01-01","2003-01-01","2003-01-01","2004-01-01","2001-01-01","2003-01-01","2004-01-01")) df<-data.frame(Disease,Class,First_Name,Last_Name,Spec_Date)
The resulting dataset should be as below:
Disease<-c("ABC","CRE","MCA","ABC","CRE","MCA")
Class<-c("Colonized","Clinical","Clinical","Clinical","Clinical","Clinical")
First_Name<-c("Roger","John","John","Mary","James","Lee")
Last_Name<-c("Smith","Doe","Doe","Poppins","Bond","Majors")
Spec_Date<-as.Date(c("2001-01-01","2003-01-01","2003-01-01","2001-01-01","2003-01-01","2004-01-01"))
df2<-data.frame(Disease,Class,First_Name,Last_Name,Spec_Date)
Any help is really appreciated.
Upvotes: 0
Views: 89
Reputation: 9878
We need to group_by
all the involved variables, then filter
for the min(Spec_Date)
:
library(dplyr)
df |>
filter(Spec_Date == min(Spec_Date),
.by = c(Disease, Class, First_Name, Last_Name))
# Or the long form, with `group_by`:
f |>
group_by(Disease, Class, First_Name, Last_Name) |>
filter(Spec_Date == min(Spec_Date)) |>
ungroup()
# A tibble: 6 × 5
Disease Class First_Name Last_Name Spec_Date
<chr> <chr> <chr> <chr> <date>
1 ABC Colonized Roger Smith 2001-01-01
2 CRE Clinical John Doe 2003-01-01
3 MCA Clinical John Doe 2003-01-01
4 ABC Clinical Mary Poppins 2001-01-01
5 CRE Clinical James Bond 2003-01-01
6 MCA Clinical Lee Majors 2004-01-01
For slicing
on minnimum and maximum, we can also use the even simpler slice_min
:
df |>
slice_min(Spec_Date,
by = c(Disease, Class, First_Name, Last_Name))
Upvotes: 1
Reputation: 5263
Do you mean this?
EDIT
library(dplyr)
union_all(
df %>%
filter(Disease != "ABC" | Class != "Colonized"),
df %>%
filter(Disease == "ABC" & Class == "Colonized") %>%
group_by(First_Name, Last_Name) %>%
summarise(Disease = "ABC",
Class = "Colonized",
Spec_Date = min(Spec_Date) %>% as.Date())
)
Disease Class First_Name Last_Name Spec_Date
1 CRE Clinical John Doe 2003-01-01
2 MCA Clinical John Doe 2003-01-01
3 ABC Clinical Mary Poppins 2001-01-01
4 CRE Clinical James Bond 2003-01-01
5 MCA Clinical Lee Majors 2004-01-01
6 ABC Colonized Roger Smith 2001-01-01
Upvotes: 1