Reputation: 578

r: Delete the whole group if it contains duplicate values

I have a dataframe of sets containing different colours. If duplicates exist within the set then I want to delete the whole set.

For instance, in the following example data, set 1 contains the colours red, red, yellow, so I want to delete set 1.

Set  Colour
Set1 red
Set1 red
Set1 yellow
Set2 green
Set2 blue
Set2 red
Set3 yellow
Set3 yellow
Set3 blue
Set3 yellow

I only want to keep set 2 as it only contains colours that appear once in the group.

Data:

structure(list(Set = c("Set1", "Set1", "Set1", "Set2", "Set2", 
"Set2", "Set3", "Set3", "Set3", "Set3"), Colour = c("red", "red", 
"yellow", "green", "blue", "red", "yellow", "yellow", "blue", 
"yellow")), class = "data.frame", row.names = c(NA, -10L))

Upvotes: 0

Answers (4)

s_baldur

Reputation: 33743

Using data.table:

library(data.table)
setDT(df)


df <- df[, .SD[anyDuplicated(Colour)==0], by = Set]
#     Set Colour
# 1: Set2  green
# 2: Set2   blue
# 3: Set2    red


# Convert back to data.frame with setDF(df)

Combining with ave() inspired by Allan Cameron

df[ave(Colour, Set, FUN=anyDuplicated)==0] # data.table
filter(df, ave(Colour, Set, FUN=anyDuplicated)==0) # dplyr
subset(df, ave(Colour, Set, FUN=anyDuplicated)==0) # Base R

Upvotes: 2

Allan Cameron

Reputation: 174586

In base R you could do:

subset(df, ave(Colour, Set, FUN=anyDuplicated) == 0)
#>    Set Colour
#> 4 Set2  green
#> 5 Set2   blue
#> 6 Set2    red

(with thanks to sindri baldur for the improvement on my original)

subset(df, Set==names(which(tapply(Colour,Set, function(x) !any(duplicated(x))))))
#>    Set Colour
#> 4 Set2  green
#> 5 Set2   blue
#> 6 Set2    red

do.call(rbind, lapply(split(df, df$Set), 
                      function(x) if(nrow(x) == length(unique(x$Colour))) x))
#>         Set Colour
#> Set2.4 Set2  green
#> Set2.5 Set2   blue
#> Set2.6 Set2    red

Upvotes: 1

Yuriy Saraykin

Reputation: 8880

try it this way

library(tidyverse)
df %>% 
  group_by(Set) %>% 
  filter(n_distinct(Colour) == n())


  Set   Colour
  <chr> <chr> 
1 Set2  green 
2 Set2  blue  
3 Set2  red

Upvotes: 1

Duck

Reputation: 39613

Try this approach. You can compute the number of observations per Set and Colour in a new variable then as you want the non duplicated sets you can use any() to test any observation greater than one and then filter only the values with an unique value. Here the code (I have used your data as df):

library(dplyr)
#Code
df %>% group_by(Set,Colour) %>%
  mutate(N=n()) %>% ungroup() %>%
  group_by(Set) %>%
  mutate(Var=any(N>1)) %>%
  filter(!Var) %>% select(-c(N,Var))

Output:

# A tibble: 3 x 2
# Groups:   Set [1]
  Set   Colour
  <chr> <chr> 
1 Set2  green 
2 Set2  blue  
3 Set2  red

Upvotes: 0

r: Delete the whole group if it contains duplicate values

Answers (4)

Related Questions