Finding unique group of rows in a data frame

Question

I have a data frame where I want to compare group of rows and remove only if the entire group is same. For example:

df<-data.frame(X=c("a", "a", "a", "b", "b", "b", "c", "c", "c"), Y=c(1,2,1,2,2,2,1,2,1), Z=c("ABC","DEF","ABC", "DEF", "DEF", "DEF", "ABC", "DEF", "ABC"))

  X Y   Z
1 a 1 ABC
2 a 2 DEF
3 a 1 ABC
4 b 2 DEF
5 b 2 DEF
6 b 2 DEF
7 c 1 ABC
8 c 2 DEF
9 c 1 ABC

Here group is identified by column X and I want to compare among different groups. So, Here group a and group c are identical. I want to get the final desired result as below:

  GroupID Y   Z
1 1       1 ABC
2 1       2 DEF
3 1       1 ABC
4 2       2 DEF
5 2       2 DEF
6 2       2 DEF

Any idea how can I do this kind of compare?

Henrik · Accepted Answer

A base R possibility:

# For each 'X', collapse 'Y' and 'Z' to a vector
l <- by(df[ , c("Y", "Z")], df$X, function(dat) paste0(dat, collapse = ""))

# select names of unique list elements
nm <- names(l)[!duplicated(l)]

# use these names to subset the data frame
df[df$X %in% nm, ]
#   X Y   Z
# 1 a 1 ABC
# 2 a 2 DEF
# 3 a 1 ABC
# 4 b 2 DEF
# 5 b 2 DEF
# 6 b 2 DEF

Finding unique group of rows in a data frame

Answers (2)

Related Questions