Reputation: 585

Removing duplicate rows based on column while keeping the highest value of the next column

I'd like to remove the duplicates from column x1 and x2 while keeping the higher value from x3.

DF:

Expected result:

x1  x2  x3 
 1   1   3
 2   2   5

I've gotten as far as df[!duplicated(df[,c(1,2)]),] but it's displaying the lowest value of x3. I'd like to get the highest x3 value.

Thanks ahead of time.

Upvotes: 3

Answers (3)

Reputation: 9618

Yet another alternative with data.table:

library(data.table)
dt <- data.table(DF)

dt[,.SD[which.max(x3)],by=list(x1, x2)]

   x1 x2 x3
1:  1  1  3
2:  2  2  5

Upvotes: 4

Reputation: 99331

You could aggregate(), using the first two columns for grouping

aggregate(x3 ~ x1 + x2, df, max)
#   x1 x2 x3
# 1  1  1  3
# 2  2  2  5

If you want to find the max in more than one column, you can add variables to the left hand side of the formula with cbind(). For example,

aggregate(cbind(x3, x4, x5) ~ x1 + x2, df, max)

Upvotes: 7

Reputation: 23014

Using the dplyr package:

library(dplyr)
df %>% group_by(x1,x2) %>% summarise(x3 = max(x3))

You could title the maximum variable "maxOfx3" or similar for clarity.

Edit: If you have additional variables whose maxima you want, you can include them in the summarise() call:

df %>% group_by(x1,x2) %>% summarise(x3 = max(x3), x4 = max(x4), avg_of_x5 = mean(x5)) etc.

Upvotes: 5