Reputation: 101
I have data frame like this, it contain 3 or more characters separated by comma (,) I want to remove the row if it contain same characters.
x <-c(1,2,3,4,5)
y <-c("a,a,a","a,a,b,c","b,c,a","b,b,b,b","a,b,b,c")
df<-data.frame(x,y)
desired output is
x <-c(2,3,5)
y <-c("a,a,b,c","b,c,a","a,b,b,c")
df<-data.frame(x,y)
Upvotes: 1
Views: 32
Reputation: 388907
You can use separate_rows
to split the comma-separated values into different rows, remove those groups where there are only 1 distinct values and summarise the data again.
library(dplyr)
df %>%
tidyr::separate_rows(y) %>%
group_by(x) %>%
filter(n_distinct(y) > 1) %>%
summarise(y = toString(y))
# x y
# <dbl> <chr>
#1 2 a, b, c
#2 3 b, c, a
#3 5 a, b, c
In base R :
df[sapply(strsplit(df$y, ','), function(x) length(unique(x))) > 1, ]
Upvotes: 1