Reputation: 4243
I have a sample dataset below:
df
Date Col1 Col2 Col3 Col4 Col5
2019-02-25 A A B 4 7
2019-02-26 A A B 10 9
2019-02-24 B C D 9 12
2019-02-23 J M L 19 18
2019-02-23 B C D 11 78
How do I subset my dataset so that if there are duplicates from df[,2:4]
, I only keep the max Date out of those duplicates? I still want the unique values from the previous table to be present.
Final expected output:
Date Col1 Col2 Col3 Col4 Col5
2019-02-26 A A B 10 9
2019-02-24 B C D 9 12
2019-02-23 J M L 19 18
Upvotes: 1
Views: 429
Reputation: 887118
We can use tidyverse
. After grouping by columns 2 to 4, slice
the row based on the index of max
imum 'Date'
library(dplyr)
df %>%
group_by_at(2:4) %>%
slice(which.max(Date))
Or using base R
df[with(df, ave(Date, Col1, Col2, Col3, FUN = max) == Date),]
Upvotes: 2
Reputation: 35554
A built-in solution which uses aggregate()
in stats
:
merge(df, aggregate(Date ~ Col1 + Col2 + Col3, df, max))
Upvotes: 1