Reputation: 27
I have a 3-columns data.frame (variables: ID.A
, ID.B
, DISTANCE
). I would like to remove the duplicates under a condition: keeping the row with the smallest value in column 3.
It is the same problem than here : R, conditionally remove duplicate rows (Similar one: Remove duplicates based on 2nd column condition)
But, in my situation, there is second problem : I have to remove rows when the couples (ID.A
, ID.B
, DISTANCE
) are duplicated, and not only when ID.A
is duplicated.
I tried several things, such as:
df <- ddply(df, 1:3, function(df) return(df[df$DISTANCE==min(df$DISTANCE),]))
but it didn't work
Example :
This dataset
id.a id.b dist
1 1 1 12
2 1 1 10
3 1 1 8
4 2 1 20
5 1 1 15
6 3 1 16
Should become:
id.a id.b dist
3 1 1 8
4 2 1 20
6 3 1 16
Upvotes: 0
Views: 4139
Reputation: 5716
Another way of achieving the solution and retaining all the columns:
df %>% arrange(dist) %>%
distinct(id.a, id.b, .keep_all=TRUE)
# id.a id.b dist
# 1 1 1 8
# 2 3 1 16
# 3 2 1 20
Upvotes: 2
Reputation: 15708
Using dplyr
, and a suitable modification to Remove duplicated rows using dplyr
library(dplyr)
df %>%
group_by(id.a, id.b) %>%
arrange(dist) %>% # in each group, arrange in ascending order by distance
filter(row_number() == 1)
Upvotes: 3