Reputation: 578
I have a dataframe of sets containing different colours. If duplicates exist within the set then I want to delete the whole set.
For instance, in the following example data, set 1 contains the colours red, red, yellow, so I want to delete set 1.
Set Colour
Set1 red
Set1 red
Set1 yellow
Set2 green
Set2 blue
Set2 red
Set3 yellow
Set3 yellow
Set3 blue
Set3 yellow
I only want to keep set 2 as it only contains colours that appear once in the group.
Data:
structure(list(Set = c("Set1", "Set1", "Set1", "Set2", "Set2",
"Set2", "Set3", "Set3", "Set3", "Set3"), Colour = c("red", "red",
"yellow", "green", "blue", "red", "yellow", "yellow", "blue",
"yellow")), class = "data.frame", row.names = c(NA, -10L))
Upvotes: 0
Views: 70
Reputation: 33743
Using data.table
:
library(data.table)
setDT(df)
df <- df[, .SD[anyDuplicated(Colour)==0], by = Set]
# Set Colour
# 1: Set2 green
# 2: Set2 blue
# 3: Set2 red
# Convert back to data.frame with setDF(df)
Combining with ave()
inspired by Allan Cameron
df[ave(Colour, Set, FUN=anyDuplicated)==0] # data.table
filter(df, ave(Colour, Set, FUN=anyDuplicated)==0) # dplyr
subset(df, ave(Colour, Set, FUN=anyDuplicated)==0) # Base R
Upvotes: 2
Reputation: 174586
In base R you could do:
subset(df, ave(Colour, Set, FUN=anyDuplicated) == 0)
#> Set Colour
#> 4 Set2 green
#> 5 Set2 blue
#> 6 Set2 red
(with thanks to sindri baldur for the improvement on my original)
or
subset(df, Set==names(which(tapply(Colour,Set, function(x) !any(duplicated(x))))))
#> Set Colour
#> 4 Set2 green
#> 5 Set2 blue
#> 6 Set2 red
or
do.call(rbind, lapply(split(df, df$Set),
function(x) if(nrow(x) == length(unique(x$Colour))) x))
#> Set Colour
#> Set2.4 Set2 green
#> Set2.5 Set2 blue
#> Set2.6 Set2 red
Upvotes: 1
Reputation: 8880
try it this way
library(tidyverse)
df %>%
group_by(Set) %>%
filter(n_distinct(Colour) == n())
Set Colour
<chr> <chr>
1 Set2 green
2 Set2 blue
3 Set2 red
Upvotes: 1
Reputation: 39613
Try this approach. You can compute the number of observations per Set
and Colour
in a new variable then as you want the non duplicated sets you can use any()
to test any observation greater than one and then filter only the values with an unique value. Here the code (I have used your data as df
):
library(dplyr)
#Code
df %>% group_by(Set,Colour) %>%
mutate(N=n()) %>% ungroup() %>%
group_by(Set) %>%
mutate(Var=any(N>1)) %>%
filter(!Var) %>% select(-c(N,Var))
Output:
# A tibble: 3 x 2
# Groups: Set [1]
Set Colour
<chr> <chr>
1 Set2 green
2 Set2 blue
3 Set2 red
Upvotes: 0