Rachit Agrawal
Rachit Agrawal

Reputation: 3343

Finding unique group of rows in a data frame

I have a data frame where I want to compare group of rows and remove only if the entire group is same. For example:

df<-data.frame(X=c("a", "a", "a", "b", "b", "b", "c", "c", "c"), Y=c(1,2,1,2,2,2,1,2,1), Z=c("ABC","DEF","ABC", "DEF", "DEF", "DEF", "ABC", "DEF", "ABC"))

  X Y   Z
1 a 1 ABC
2 a 2 DEF
3 a 1 ABC
4 b 2 DEF
5 b 2 DEF
6 b 2 DEF
7 c 1 ABC
8 c 2 DEF
9 c 1 ABC

Here group is identified by column X and I want to compare among different groups. So, Here group a and group c are identical. I want to get the final desired result as below:

  GroupID Y   Z
1 1       1 ABC
2 1       2 DEF
3 1       1 ABC
4 2       2 DEF
5 2       2 DEF
6 2       2 DEF

Any idea how can I do this kind of compare?

Upvotes: 3

Views: 142

Answers (2)

Henrik
Henrik

Reputation: 67778

A base R possibility:

# For each 'X', collapse 'Y' and 'Z' to a vector
l <- by(df[ , c("Y", "Z")], df$X, function(dat) paste0(dat, collapse = ""))

# select names of unique list elements
nm <- names(l)[!duplicated(l)]

# use these names to subset the data frame
df[df$X %in% nm, ]
#   X Y   Z
# 1 a 1 ABC
# 2 a 2 DEF
# 3 a 1 ABC
# 4 b 2 DEF
# 5 b 2 DEF
# 6 b 2 DEF

Upvotes: 2

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193527

You may need to look into the compare function from the "compare" package. Here's a possibility:

library(compare)
x <- with(df, split(df[-1], df[[1]]))
Splits <- combn(names(x), 2)
Comparison <- apply(Splits, 2, function(y) {
  compare(x[y[1]], x[y[2]], allowAll = TRUE)$result
})
Splits[, Comparison]
# [1] "a" "c"

From this we can see that groups "a" and "c" are duplicated, and we can use that to subset the original dataset.


I've used allowAll = TRUE in this answer, but you may want to look at the other options available in compare to decide what transformations you would actually want to allow in your comparisons.

Upvotes: 2

Related Questions