Reputation: 3343
I have a data frame where I want to compare group of rows and remove only if the entire group is same. For example:
df<-data.frame(X=c("a", "a", "a", "b", "b", "b", "c", "c", "c"), Y=c(1,2,1,2,2,2,1,2,1), Z=c("ABC","DEF","ABC", "DEF", "DEF", "DEF", "ABC", "DEF", "ABC"))
X Y Z
1 a 1 ABC
2 a 2 DEF
3 a 1 ABC
4 b 2 DEF
5 b 2 DEF
6 b 2 DEF
7 c 1 ABC
8 c 2 DEF
9 c 1 ABC
Here group is identified by column X and I want to compare among different groups. So, Here group a and group c are identical. I want to get the final desired result as below:
GroupID Y Z
1 1 1 ABC
2 1 2 DEF
3 1 1 ABC
4 2 2 DEF
5 2 2 DEF
6 2 2 DEF
Any idea how can I do this kind of compare?
Upvotes: 3
Views: 142
Reputation: 67778
A base
R possibility:
# For each 'X', collapse 'Y' and 'Z' to a vector
l <- by(df[ , c("Y", "Z")], df$X, function(dat) paste0(dat, collapse = ""))
# select names of unique list elements
nm <- names(l)[!duplicated(l)]
# use these names to subset the data frame
df[df$X %in% nm, ]
# X Y Z
# 1 a 1 ABC
# 2 a 2 DEF
# 3 a 1 ABC
# 4 b 2 DEF
# 5 b 2 DEF
# 6 b 2 DEF
Upvotes: 2
Reputation: 193527
You may need to look into the compare
function from the "compare" package. Here's a possibility:
library(compare)
x <- with(df, split(df[-1], df[[1]]))
Splits <- combn(names(x), 2)
Comparison <- apply(Splits, 2, function(y) {
compare(x[y[1]], x[y[2]], allowAll = TRUE)$result
})
Splits[, Comparison]
# [1] "a" "c"
From this we can see that groups "a" and "c" are duplicated, and we can use that to subset the original dataset.
I've used allowAll = TRUE
in this answer, but you may want to look at the other options available in compare
to decide what transformations you would actually want to allow in your comparisons.
Upvotes: 2