Nick Knauer
Nick Knauer

Reputation: 4243

Subset Dataframe of Duplicate Values by Max Date

I have a sample dataset below:

df

      Date    Col1    Col2    Col3   Col4   Col5
2019-02-25       A       A       B      4      7
2019-02-26       A       A       B     10      9
2019-02-24       B       C       D      9     12
2019-02-23       J       M       L     19     18
2019-02-23       B       C       D     11     78

How do I subset my dataset so that if there are duplicates from df[,2:4], I only keep the max Date out of those duplicates? I still want the unique values from the previous table to be present.

Final expected output:

      Date    Col1    Col2    Col3   Col4   Col5
2019-02-26       A       A       B     10      9
2019-02-24       B       C       D      9     12
2019-02-23       J       M       L     19     18

Upvotes: 1

Views: 429

Answers (2)

akrun
akrun

Reputation: 887118

We can use tidyverse. After grouping by columns 2 to 4, slice the row based on the index of maximum 'Date'

library(dplyr)
df %>%
  group_by_at(2:4) %>% 
  slice(which.max(Date))

Or using base R

df[with(df, ave(Date, Col1, Col2, Col3, FUN = max) == Date),]

Upvotes: 2

Darren Tsai
Darren Tsai

Reputation: 35554

A built-in solution which uses aggregate() in stats:

merge(df, aggregate(Date ~ Col1 + Col2 + Col3, df, max))

Upvotes: 1

Related Questions