Reputation: 69
I want to remove duplicates based on similarities or differences of other columns.
All the duplicated ID should be completely removed but just if they have DIFFERENT colours. It doesn't matter if they have different subgroups as well. If they have the same ID AND the same colour, just the first one should be kept.
At the end I want to have a list of all ID which are single-colour only (independent of subgroup). All the multicoloured ID should be removed.
Here and example:
id colour subgroup
1 1 red lightred
2 2 blue lightblue
3 2 blue darkblue
4 3 red lightred
5 4 red darkred
6 4 red darkred
7 4 blue lightblue
8 5 green darkgreen
9 5 green darkgreen
10 5 green lightgreen
11 6 red darkred
12 6 blue darkblue
13 6 green lightgreen
At the end it should look like this:
id colour subgroup
1 1 red lightred
2 2 blue lightblue
4 3 red lightred
8 5 green darkgreen
The data I used for this example:
id = c(1,2,2,3,4,4,4,5,5,5,6,6,6)
colour = c("red","blue","blue","red","red","red","blue","green","green","green","red","blue","green")
subgroup = c("lightred","lightblue","darkblue","lightred","darkred","darkred","lightblue","darkgreen","darkgreen","lightgreen","darkred","darkblue","lightgreen")
data = data.frame(cbind(id,colour,subgroup))
Thanks for your help!
Upvotes: 1
Views: 81
Reputation: 2528
I got a small data.table
solution. It first filters on all non-duplicated id
, colour
combinations and then selects all combinations, where only one id
, colour
combination exists.
library(data.table)
dt.data <- data.table(data)
dt.data[!duplicated(dt.data, by = c("id", "colour"))
,.(colour, subgroup, .N)
, by = list(id)][N==1, .(id
, colour
, subgroup)]
Upvotes: 0
Reputation: 79228
library(tidyverse)
data%>%
group_by(id)%>%
filter(1==length(unique(colour)),!duplicated(colour))
# A tibble: 4 x 3
# Groups: id [4]
id colour subgroup
<fct> <fct> <fct>
1 1 red lightred
2 2 blue lightblue
3 3 red lightred
4 5 green darkgreen
Using Base R:
subset(data,as.logical(ave(colour,id,FUN=function(x)length(unique(x))==1& !duplicated(x))))
id colour subgroup
1 1 red lightred
2 2 blue lightblue
4 3 red lightred
8 5 green darkgreen
Upvotes: 2