Reputation: 3512
My head stands still at the moment. I would like to match/extract data from a larger data.frame (df) based on the columns in a smaller data.frame (mdf). What I'm getting stuck on is the fact that I want to match multiple columns (two in this case). I have tried different approaches using e.g. merge
, which
, match
%in%
but non have succeeded.
# Dummy example
# Large df
df <- mtcars[1:6,1:3]
df$car_1 <- rownames(df)
df$car_2 <- rownames(tail(mtcars))
# df to match
mdf <- df[c("car_1","car_2")][3:6,]
rownames(df) <- NULL
rownames(mdf) <- NULL
The desired output would look something like
mpg cyl disp car_1 car_2
22.8 4 108 Datsun 710 Ford Pantera L
21.4 6 258 Hornet 4 Drive Ferrari Dino
18.7 8 360 Hornet Sportabout Maserati Bora
18.1 6 225 Valiant Volvo 142E
This feels like it should be very straight forward.
Any pointer would be highly appreciated, thanks!
Upvotes: 10
Views: 23859
Reputation: 39647
In case you would use match or %in% on multiple columns you could use interaction, paste or use a list to match on multiple columns.
df[match(interaction(mdf), interaction(df[c("car_1", "car_2")])),]
df[match(paste(mdf$car_1, mdf$car_2), paste(df$car_1, df$car_2),),]
df[match(asplit(mdf, 1), asplit(df[c("car_1", "car_2")], 1)),]
df[interaction(df[c("car_1", "car_2")]) %in% interaction(mdf),]
Upvotes: 10
Reputation: 3615
How about merge(df, mdf, all.x = FALSE, all.y = TRUE)
?
Edit: If you have different column names you can specify which ones to merge on, e.g.:
names(mdf) <- c("car_3", "car_4")
merge(df, mdf, by.x = c("car_1", "car_2"), by.y = c("car_3", "car_4"),
all.x = FALSE, all.y = TRUE)
Upvotes: 11
Reputation: 23574
Another way would be:
library(dplyr)
inner_join(df, mdf)
#Joining by: c("car_1", "car_2")
# car_1 car_2 mpg cyl disp
#1 Datsun 710 Ford Pantera L 22.8 4 108
#2 Hornet 4 Drive Ferrari Dino 21.4 6 258
#3 Hornet Sportabout Maserati Bora 18.7 8 360
#4 Valiant Volvo 142E 18.1 6 225
Upvotes: 7