Reputation: 1769

Remove duplicate rows based on a column values by storing the row whose entry in another column is maximum

I have the following matrix

> mat<-rbind(c(9,6),c(10,6),c(11,7),c(12,7),c(12,8),c(12,9),c(12,10),c(12,11),c(12,12),c(13,12))
> mat
      [,1] [,2]
[1,]     9    6
[2,]    10    6
[3,]    11    7
[4,]    12    7
[5,]    12    8
[6,]    12    9
[7,]    12   10
[8,]    12   11
[9,]    12   12
[10,]   13   12

I would like to remove duplicate rows based on first column values and store the row whose entry in the second column is maximum. E.g. for the example above, the desidered outcome is

      [,1] [,2]
[1,]     9    6
[2,]    10    6
[3,]    11    7
[4,]    12   12
[5,]    13   12

I tried with

> mat[!duplicated(mat[,1]),]

but I obtained

     [,1] [,2]
[1,]    9    6
[2,]   10    6
[3,]   11    7
[4,]   12    7
[5,]   13   12

which is different from the desidered outcome for the entry [4,2]. Suggestions?

Upvotes: 0

Answers (3)

geekzeus

Reputation: 895

First Sort then keep only the first row for each duplicate

mat <- mat[order(mat[,1], mat[,2]),]
mat[!duplicated(mat[,1]),]

EDIT: Sorry I thought your desired result is last df,Ok so you want max value

mat<-rbind(c(9,6),c(10,6),c(11,7),c(12,7),c(12,8),c(12,9),c(12,10),c(12,11),c(12,12),c(13,12))

#Reverse sort
mat <- mat[order(mat[,1], mat[,2], decreasing=TRUE),]
#Keep only the first row for each duplicate, this will give the largest values
mat <- mat[!duplicated(mat[,1]),]
#finally sort it
mat <- mat[order(mat[,1], mat[,2]),]

Upvotes: 1

AkselA

Reputation: 8836

Like Josephs solution, but if you add row names first you can keep the original order (which will be the same in this case).

rownames(mat) <- 1:nrow(mat)

mat <- mat[order(mat[,2], -mat[,2]),]

mat <- mat[!duplicated(mat[,1]),]
mat[order(as.numeric(rownames(mat))),]
#   [,1] [,2]
# 1    9    6
# 2   10    6
# 3   11    7
# 4   12   12
# 5   13   12

Upvotes: 1

josephjscheidt

Reputation: 326

You can sort the matrix first, using ascending order for column 1 and descending order for column 2. Then the duplicated function will remove all but the maximum column 2 value for each column 1 value.

mat <- mat[order(mat[,1],-mat[,2]),]

mat[!duplicated(mat[,1]),]

         [,1] [,2]
    [1,]    9    6
    [2,]   10    6
    [3,]   11    7
    [4,]   12   12
    [5,]   13   12

Upvotes: 3

Remove duplicate rows based on a column values by storing the row whose entry in another column is maximum

Answers (3)

Related Questions