Reputation: 3234
I'm trying to make sense of how to sort one data.frame based on multiple columns in another. This question does this with vectors. Can someone suggest a way to do the equivalent with data.frames?
Here's some sample data.
x1 <- data.frame(a=1:5, b=letters[1:5], c=rnorm(5))
x2 <- data.frame(a=c(4,4,2), b=c("d", "d", "b"), d=rnorm(3))
So I want to sort x2 by the first two columns of x1. My actual data is much more complicated, but this replicates the idea...
Upvotes: 18
Views: 23806
Reputation: 420
Here is an updated version of how to sort a dataframe:
> right_join(x1, x2[,1:2], by=c("a","b"))
a b c
1 2 b 0.6182433
2 4 d -2.2239003
3 4 d -2.2239003
Upvotes: 1
Reputation: 4692
This can be done precisely using plyr
. Joris' answer would work fairly well, but potentially could missort when combining strings:
> paste ("A A","B")
[1] "A A B"
> paste ("A","A B")
[1] "A A B"
You can get an exact answer using join.keys
and match
:
x1 <- data.frame(a=rep(1:3,2), b=rep(letters[2:4],each=2), c=rnorm(6))
x2 <- data.frame(a=c(3,3,2), b=c("c", "d", "b"), d=rnorm(3))
library(plyr)
keys<-join.keys(x1,x2,c("a","b"))
matches<-match(keys$y,keys$x,nomatch=(keys$n+1))
x2[order(matches),]
This should handle most edge cases, mismatched list sizes, etc. Items without a match in both the index columns are put at the end of the list.
Upvotes: 3
Reputation: 108583
It really depends on what your data really looks like. As it looks right now, you only need one column to sort, and that is easily done by:
x2[order(match(x2[,1],x1[,1])),]
If you need more than one column, this becomes a bit trickier. You will have to specify which one you want to sort first on, and which one second, eg :
x1 <- data.frame(a=rep(1:3,2), b=rep(letters[2:4],each=2), c=rnorm(6))
x2 <- data.frame(a=c(3,3,2), b=c("c", "d", "b"), d=rnorm(3))
x2[order(match(
paste(x2[,1],x2[,2]),
paste(x1[,1],x1[,2]))
),]
This sorts on the first column first, and then on the second. You have to keep in mind that you need all combinations in x2 also in x1. T
Upvotes: 32
Reputation: 44142
Attach a rank column to the relevant columns of x1:
len <- dim(x1)[1]
x1. <- cbind(x1[,1:2], rank=1:len)
Merge into x2 (this is like a SQL join; see the merge documentation for how to specify what happens if there are ambiguities such as multiple matches or no matches):
x2. <- merge(x2, x1.)
Sort:
x2.[order(x2.[,'rank']),]
Upvotes: 4