r aggregate max when by parameter is based on multiple columns

I have a dataset as follows

 Id       Date         Subject       Score
 12221    08/01/2007  Math          89
 12221    08/01/2007  Math          92
 12221    08/01/2007  Math          78
 12221    11/01/2007  Math          36
 12221    11/01/2007  Math          45
 12221    11/01/2007  Math          24
 2856     03/18/2004  Science       56
 2856     03/18/2004  Science       49
 2856     03/18/2004  Science       84

I am trying to retain only rows where the Score is maximum for a combination of Id, Date and Subject for example the final output should look like this below

 Id       Date         Subject       Score
 12221    08/01/2007  Math          92
 12221    11/01/2007  Math          45
 2856     03/18/2004  Science       84

I tried aggregate function,

aggregate(score ~ list(Id,Date,Subject), df, max)

this did not work, tried dcast , which.max etc neither of these are producing the desired results, any help on resolving this issue is much appreciated.

Upvotes: 1

Views: 1526

Answers (1)

akrun
akrun

Reputation: 887391

We can use aggregate

aggregate(Score~., df1, FUN= max)
#        Id       Date Subject Score
#1 12221 08/01/2007    Math    92
#2 12221 11/01/2007    Math    45
#3  2856 03/18/2004 Science    84

Or with dplyr

library(dplyr)
df1 %>%
   group_by(Id, Date, Subject) %>%
   summarise(Score= max(Score))

Or using data.table

library(data.table)
setDT(df1)[, list(Score= max(Score)), by = .(Id, Date, Subject)] 

Upvotes: 2

Related Questions