Isuru
Isuru

Reputation: 101

a,Remove the duplicate based on elements in character vector

I have data frame like this, it contain 3 or more characters separated by comma (,) I want to remove the row if it contain same characters.

x <-c(1,2,3,4,5)
y <-c("a,a,a","a,a,b,c","b,c,a","b,b,b,b","a,b,b,c")
df<-data.frame(x,y)

desired output is

x <-c(2,3,5)
y <-c("a,a,b,c","b,c,a","a,b,b,c")
df<-data.frame(x,y)

Upvotes: 1

Views: 32

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388907

You can use separate_rows to split the comma-separated values into different rows, remove those groups where there are only 1 distinct values and summarise the data again.

library(dplyr)

df %>%
  tidyr::separate_rows(y) %>%
  group_by(x) %>%
  filter(n_distinct(y) > 1) %>%
  summarise(y = toString(y))

#      x y      
#  <dbl> <chr>  
#1     2 a, b, c
#2     3 b, c, a
#3     5 a, b, c

In base R :

df[sapply(strsplit(df$y, ','), function(x) length(unique(x))) > 1, ]

Upvotes: 1

Related Questions