Reputation: 1068
Consider the data frame Data
created here:
Data <- data.frame(Location = rep(letters[1:20], each = 10))
I want to do pairwise comparisons manually. First, I want to find all possible pairwise combinations between the levels of Location
in Data
, so I make the object Pairs
like this:
Pairs <-expand.grid(unique(Data$Location),unique(Data$Location))
Now I want to remove rows from the object Pairs
where Pairs$Var1 == Pairs$Var2
, and I want to remove rows that are "duplicates" of previous pairs, but in the opposite order.
In other words, I want to remove rows where Pairs$Var1 == a
& Pairs$Var2 == a
, Pairs$Var1 == b
& Pairs$Var2 == b
, and so on (i.e., I don't want to compare Locations
to themselves), and I also don't want the same comparison to be made twice, so if Var1==a
has already been compared (or already exists earlier in the data.frame) to Var2 == b
, then I don't want to compare Var1 == b
to Var2 == a
, so I need to remove one of these combinations, but not the other (I hope that makes sense).
How can I do this?
We can use Pairs[Pairs$Var1 == Pairs$Var2,]
to see where Var1 == Var2
, but this doesn't help with the second problem
Upvotes: 1
Views: 3334
Reputation:
Your dataframe:
df <- data.frame(Location = rep(letters[1:20], each = 10))
df <-expand.grid(unique(df$Location),unique(df$Location))
dim(df) # 400 rows x 2 columns
Remove the duplicates:
df = df[!duplicated(t(apply(df, 1, sort))), ]
dim(df) # 210 rows x 2 columns
Remove Var1 = Var2
df = subset(df, df$Var1 != df$Var2)
dim(df) # 190 rows x 2 columns
Upvotes: 0
Reputation: 4658
A neat trick to make sure that two columns are different and that there are no symmetrical duplicates is to use the 'greater than' or 'smaller than' operator.
Pairs <- Pairs[as.character(Pairs$Var1) < as.character(Pairs$Var2),]
or, if you want the inverse,
Pairs <- Pairs[as.character(Pairs$Var1) >= as.character(Pairs$Var2),]
This works because a < a
is False
(an item is not less than itself), and for every pair (a, b)
either a < b
or b < a
is True
, the other is False
. That way, for every such pair you only keep one.
Therefore as.character(Pairs$Var1) < as.character(Pairs$Var2)
returns a vector of True
and False
that you can use to slice your data.frame
. The as.character()
is needed because one cannot compare factor
s using <
.
Upvotes: 3
Reputation: 395
I post a solution which uses a for
:
First, remove rows with the same value in both columns:
Pairs <- Pairs[Pairs$Var1 != Pairs$Var2,]
Second, remove "standard" duplicates:
Pairs <- Pairs[!duplicated(Pairs),]
Finally, remove duplicates that are in opposite order. My strategy involves creating a temporary columns that allows you (a) not to make the search for cases that you know already are duplicates; (b) make the final filtering. Then, you can remove the temporary column:
Pairs$my_duplicated <- FALSE
for(i in 1:nrow(Pairs)){
if(Pairs$my_duplicated[i] == FALSE){
my_test <- Pairs$Var2 %in% Pairs$Var1[i] & Pairs$Var1 %in% Pairs$Var2[i]
Pairs$my_duplicated[my_test] <- TRUE
}
}
Pairs <- Pairs[!Pairs$my_duplicated,]
Pairs$my_duplicated <- NULL
Upvotes: 1