Subset Dataframe of Duplicate Values by Max Date

Question

I have a sample dataset below:

df

      Date    Col1    Col2    Col3   Col4   Col5
2019-02-25       A       A       B      4      7
2019-02-26       A       A       B     10      9
2019-02-24       B       C       D      9     12
2019-02-23       J       M       L     19     18
2019-02-23       B       C       D     11     78

How do I subset my dataset so that if there are duplicates from df[,2:4], I only keep the max Date out of those duplicates? I still want the unique values from the previous table to be present.

Final expected output:

      Date    Col1    Col2    Col3   Col4   Col5
2019-02-26       A       A       B     10      9
2019-02-24       B       C       D      9     12
2019-02-23       J       M       L     19     18

akrun · Accepted Answer

We can use tidyverse. After grouping by columns 2 to 4, slice the row based on the index of maximum 'Date'

library(dplyr)
df %>%
  group_by_at(2:4) %>% 
  slice(which.max(Date))

Or using base R

df[with(df, ave(Date, Col1, Col2, Col3, FUN = max) == Date),]

Subset Dataframe of Duplicate Values by Max Date

Answers (2)

Related Questions