Spes Alpha
Spes Alpha

Reputation: 27

Remove duplicate rows based on conditions from multiple columns (decreasing order) in R

I have a 3-columns data.frame (variables: ID.A, ID.B, DISTANCE). I would like to remove the duplicates under a condition: keeping the row with the smallest value in column 3.

It is the same problem than here : R, conditionally remove duplicate rows (Similar one: Remove duplicates based on 2nd column condition)

But, in my situation, there is second problem : I have to remove rows when the couples (ID.A, ID.B, DISTANCE) are duplicated, and not only when ID.A is duplicated.

I tried several things, such as:

df <- ddply(df, 1:3, function(df) return(df[df$DISTANCE==min(df$DISTANCE),]))

but it didn't work

Example :

This dataset

    id.a id.b dist
1    1    1   12
2    1    1   10
3    1    1   8
4    2    1   20
5    1    1   15
6    3    1   16

Should become:

    id.a id.b dist
3    1    1   8
4    2    1   20
6    3    1   16

Upvotes: 0

Views: 4139

Answers (2)

Prradep
Prradep

Reputation: 5716

Another way of achieving the solution and retaining all the columns:

df %>% arrange(dist) %>% 
  distinct(id.a, id.b, .keep_all=TRUE)

#   id.a id.b dist
# 1    1    1    8
# 2    3    1   16
# 3    2    1   20

Upvotes: 2

Alex
Alex

Reputation: 15708

Using dplyr, and a suitable modification to Remove duplicated rows using dplyr

library(dplyr)

df %>%
group_by(id.a, id.b) %>%
arrange(dist) %>% # in each group, arrange in ascending order by distance
filter(row_number() == 1)

Upvotes: 3

Related Questions