Remove duplicate rows based on conditions from multiple columns (decreasing order) in R

Question

I have a 3-columns data.frame (variables: ID.A, ID.B, DISTANCE). I would like to remove the duplicates under a condition: keeping the row with the smallest value in column 3.

It is the same problem than here : R, conditionally remove duplicate rows (Similar one: Remove duplicates based on 2nd column condition)

But, in my situation, there is second problem : I have to remove rows when the couples (ID.A, ID.B, DISTANCE) are duplicated, and not only when ID.A is duplicated.

I tried several things, such as:

df <- ddply(df, 1:3, function(df) return(df[df$DISTANCE==min(df$DISTANCE),]))

but it didn't work

Example :

This dataset

    id.a id.b dist
1    1    1   12
2    1    1   10
3    1    1   8
4    2    1   20
5    1    1   15
6    3    1   16

Should become:

    id.a id.b dist
3    1    1   8
4    2    1   20
6    3    1   16

Alex · Accepted Answer

Using dplyr, and a suitable modification to Remove duplicated rows using dplyr

library(dplyr)

df %>%
group_by(id.a, id.b) %>%
arrange(dist) %>% # in each group, arrange in ascending order by distance
filter(row_number() == 1)

Remove duplicate rows based on conditions from multiple columns (decreasing order) in R

Answers (2)

Related Questions