Reputation: 11
I have 2 different data frames in R
df1:
# | V1 | V2 |
---|---|---|
1 | 200 | 300 |
2 | 201 | 301 |
3 | 202 | 302 |
df2:
# | V1 | V2 | week |
---|---|---|---|
1 | 200 | 300 | 12-02-2018 |
2 | 301 | 201 | 25-05-2017 |
3 | 302 | 202 | 02-12-2016 |
I am looking to merge them together with a VLOOKUP equivalent.
The idea would be to add the week from df2 to df1 IF:
(df1$V1 == df2$V1 & df1$V2 == df2$V2) OR (df1$V1 == df2$V2 & df1$V2 == df2$V1).
Since V1 and V2 are assigned randomly, I need to have to the condition go both ways.
Any help?
Thanks a lot!
Upvotes: 1
Views: 197
Reputation: 389047
Sort the columns V1
and V2
in both the dataframes and then perform the merge
.
df1 <- transform(df1, V1 = pmin(V1, V2), V2 = pmax(V1, V2))
df2 <- transform(df2, V1 = pmin(V1, V2), V2 = pmax(V1, V2))
merge(df1, df2, by = c('id', 'V1', 'V2'))
# id V1 V2 week
#1 1 200 300 12-02-2018
#2 2 201 301 25-05-2017
#3 3 202 302 02-12-2016
data
df1 <- structure(list(id = 1:3, V1 = 200:202, V2 = 300:302),
row.names = c(NA, -3L), class = "data.frame")
df2 <- structure(list(id = 1:3, V1 = c(200L, 301L, 302L), V2 = c(300L,
201L, 202L), week = c("12-02-2018", "25-05-2017", "02-12-2016"
)), row.names = c(NA, -3L), class = "data.frame")
Upvotes: 1
Reputation: 586
You could merge first on V1 (df1) = V1 (df2) and V2 (df1) = V2 (df2), then get the rows from df1 that did not meet these conditions. With these rows you could merge a second time now with V1 (df1) = V2 (df2) and V2 (df1) = V1 (df2), hence mimicking the 'OR' condition in the order you stated.
#Replicates of your dataframes
df1 <- data.frame(matrix(c(1, 200, 300,
2, 201, 301,
3, 202, 302), ncol=3, byrow = TRUE))
colnames(df1) <- c("iddf1", "V1", "V2")
df2 <- data.frame(matrix(c(1, 200, 300, "12-02-2018",
2, 301, 201, "25-05-2017",
3, 302, 202, "02-12-2016"), ncol=4, byrow = TRUE))
colnames(df2) <- c("iddf2", "V1", "V2", "week")
#Merge first on V1 (df1) = V2 (df2) and V2 (df1) = V2 (df2)
df.merged.1 <- merge(df1, df2, by = c("V1", "V2"), all.x = T)
#Extract the rows that did dot match
df1.unmet <- df.merged.1[is.na(df.merged.1$iddf2),c("iddf1", "V1", "V2")]
df.merged.1 <- df.merged.1[!is.na(df.merged.1$iddf2),]
#Merge then on V2 (df1) = V1 (df2) and V2 (df1) = V1 (df2)
df.merged.2 <- merge(df1.unmet, df2, by.x=c("V1", "V2"), by.y = c("V2", "V1"))
#rbind the two dataframes to get the final result
df.merged <- rbind(df.merged.1, df.merged.2)
df.merged
# V1 V2 iddf1 iddf2 week
#1 200 300 1 1 12-02-2018
#2 201 301 2 2 25-05-2017
#3 202 302 3 3 02-12-2016
Upvotes: 0