Reputation: 902
I have two data frames (df1
and df2
) and I want to subset df2 based on the first two columns contained in df1. For example,
df1 = data.frame(x=c(1,1,1,1,1),y=c(1,2,3,4,5),value=c(3,4,5,6,7))
df2 = data.frame(x=c(1,1,1,1,1,2), y=c(5,3,4,2,1,6), value=c(8,9,10,11,12,13))
As we can see, row 6 (2,6)
in df2
is not included in df1
, so I will just select row 1 to row 5 in df2.
Also, I want to rearrange df2
based on df1
. The final result should be like this:
Thanks for any help.
Upvotes: 1
Views: 2856
Reputation: 28705
When using merge, by default the data frames are joined by the variables they have in common, and the results are sorted. So you can do:
merge(df2, df1[c('x', 'y')])
# x y value
# 1 1 1 12
# 2 1 2 11
# 3 1 3 9
# 4 1 4 10
# 5 1 5 8
To sort by the order of df1
, use @Mankind_008's method
merge(df1[c('x','y')], df2 , sort = F)
Example:
set.seed(0)
df1 <- df1[sample(seq_len(nrow(df1))),]
df2 <- df2[sample(seq_len(nrow(df2))),]
df1
# x y value
# 5 1 5 7
# 2 1 2 4
# 4 1 4 6
# 3 1 3 5
# 1 1 1 3
merge(df1[c('x','y')], df2 , sort = F)
# x y value
# 1 1 5 8
# 2 1 2 11
# 3 1 4 10
# 4 1 3 9
# 5 1 1 12
Upvotes: 3
Reputation: 1001
Use data tables:
library(data.table)
Create your data as data.table:
df1 <- data.table( x = c(1,1,1,1,1), y = c(1,2,3,4,5), value = c(3,4,5,6,7) )
df2 <- data.table( x = c(1,1,1,1,1,2), y = c(5,3,4,2,1,6), value = c(8,9,10,11,12,13) )
Or convert your existing data.frames:
df1 <- as.data.table( df1 )
df2 <- as.data.table( df2 )
Then:
df2[ df1, on = .(x,y) ]
Any column in df1 that have the same name in df2 will be renamed as i.columnname:
x y value i.value
1: 1 1 12 3
2: 1 2 11 4
3: 1 3 9 5
4: 1 4 10 6
5: 1 5 8 7
Note that it already order by x and y. If you want to order by the column 'value' (or any other):
df2[ df1, on = .(x,y) ][ order(value) ]
The advantage of using data.table (or dplyr, as the solution proposed by AntoniosK) is that you can keep the two data sets separated.
Upvotes: 1