How to subset a dataframe based on columns from another dataframe?

Question

I have two data frames (df1 and df2) and I want to subset df2 based on the first two columns contained in df1. For example,

df1 = data.frame(x=c(1,1,1,1,1),y=c(1,2,3,4,5),value=c(3,4,5,6,7))
df2 = data.frame(x=c(1,1,1,1,1,2), y=c(5,3,4,2,1,6), value=c(8,9,10,11,12,13))

As we can see, row 6 (2,6) in df2 is not included in df1, so I will just select row 1 to row 5 in df2.

Also, I want to rearrange df2 based on df1. The final result should be like this:

Thanks for any help.

IceCreamToucan · Accepted Answer

When using merge, by default the data frames are joined by the variables they have in common, and the results are sorted. So you can do:

merge(df2, df1[c('x', 'y')])

#   x y value
# 1 1 1    12
# 2 1 2    11
# 3 1 3     9
# 4 1 4    10
# 5 1 5     8

To sort by the order of df1, use @Mankind_008's method

merge(df1[c('x','y')], df2 , sort = F)

Example:

set.seed(0)
df1 <- df1[sample(seq_len(nrow(df1))),]
df2 <- df2[sample(seq_len(nrow(df2))),]
df1
#   x y value
# 5 1 5     7
# 2 1 2     4
# 4 1 4     6
# 3 1 3     5
# 1 1 1     3    
merge(df1[c('x','y')], df2 , sort = F)
#   x y value
# 1 1 5     8
# 2 1 2    11
# 3 1 4    10
# 4 1 3     9
# 5 1 1    12

How to subset a dataframe based on columns from another dataframe?

Answers (2)

Related Questions