RazK
RazK

Reputation: 13

Extract the row with the earliest date if it meets multiple conditions in R

I would like to filter my dataset Based on the following conditions:

The resulting dataset should be as below:

Disease<-c("ABC","CRE","MCA","ABC","CRE","MCA")
Class<-c("Colonized","Clinical","Clinical","Clinical","Clinical","Clinical")
First_Name<-c("Roger","John","John","Mary","James","Lee")
Last_Name<-c("Smith","Doe","Doe","Poppins","Bond","Majors")
Spec_Date<-as.Date(c("2001-01-01","2003-01-01","2003-01-01","2001-01-01","2003-01-01","2004-01-01"))
df2<-data.frame(Disease,Class,First_Name,Last_Name,Spec_Date)

Any help is really appreciated.

Upvotes: 0

Views: 89

Answers (2)

GuedesBF
GuedesBF

Reputation: 9878

We need to group_by all the involved variables, then filter for the min(Spec_Date):

library(dplyr)

df |> 
    filter(Spec_Date == min(Spec_Date),
           .by = c(Disease, Class, First_Name, Last_Name))

# Or the long form, with `group_by`:

f |> 
    group_by(Disease, Class, First_Name, Last_Name) |> 
    filter(Spec_Date == min(Spec_Date)) |> 
    ungroup()

# A tibble: 6 × 5
  Disease Class     First_Name Last_Name Spec_Date 
  <chr>   <chr>     <chr>      <chr>     <date>    
1 ABC     Colonized Roger      Smith     2001-01-01
2 CRE     Clinical  John       Doe       2003-01-01
3 MCA     Clinical  John       Doe       2003-01-01
4 ABC     Clinical  Mary       Poppins   2001-01-01
5 CRE     Clinical  James      Bond      2003-01-01
6 MCA     Clinical  Lee        Majors    2004-01-01

For slicing on minnimum and maximum, we can also use the even simpler slice_min:

df |> 
    slice_min(Spec_Date,
           by = c(Disease, Class, First_Name, Last_Name))

Upvotes: 1

asd-tm
asd-tm

Reputation: 5263

Do you mean this?

EDIT

library(dplyr)
    
union_all(
  df %>% 
    filter(Disease != "ABC" | Class != "Colonized"),
  df %>% 
    filter(Disease == "ABC" & Class == "Colonized") %>% 
    group_by(First_Name, Last_Name) %>% 
    summarise(Disease = "ABC",
              Class = "Colonized",
              Spec_Date = min(Spec_Date) %>% as.Date())
)

    
  Disease     Class First_Name Last_Name  Spec_Date
1     CRE  Clinical       John       Doe 2003-01-01
2     MCA  Clinical       John       Doe 2003-01-01
3     ABC  Clinical       Mary   Poppins 2001-01-01
4     CRE  Clinical      James      Bond 2003-01-01
5     MCA  Clinical        Lee    Majors 2004-01-01
6     ABC Colonized      Roger     Smith 2001-01-01

Upvotes: 1

Related Questions