Reputation: 3677
I have a R dataframe df_big
Candidate Status
A 1
B 10
C 12
D 15
E 25
and so on
I have a second dataframe df_small
Candidate_1 Candidate_2
A C
B E
C D
I want to merge df_small
and df_big
to get df_final
which looks like
Candidate_1 Candidate_2 Status_1 Status_2
A C 1 12
B E 10 25
C D 12 15
I tried something to the effect
df_small_1 = merge(x=df_small,y = df_big,by.x = "Candidate_1",by.y="Candidate")
df_small_2 = merge(x=df_small,y = df_big,by.x = "Candidate_2",by.y="Candidate")
but I do not know how to combine df_small_1
and df_small_2
to df_small
Upvotes: 0
Views: 74
Reputation: 2725
Merging is an expensive operation. You could better do this without the need for merge operation using a combination of which and indexing. I have benchmarked on the merge and non-merge solution. The answer also gives the order of columns exactly as needed.
doit <- function(df_small, df_big)
{
# Which elements do we need to copy
indx1 <- df_big[["Candidate"]] %in% df_small[["Candidate_1"]]
indx2 <- df_big[["Candidate"]] %in% df_small[["Candidate_2"]]
# Copy them
df_needed <- data.frame(Candiate_1 = df_big[indx1, "Candidate"], Candiate_2 = df_big[indx2, "Candidate"],
Status_1 = df_big[indx1, "Status"], Status_2 = df_big[indx2, "Status"])
}
#merge two times
doit_merge <- function(df_small, df_big)
{
df_result <- merge(x=df_small, y=df_big, by.x="Candidate_1", by.y="Candidate")
df_result <- merge(x=df_result, y=df_big, by.x="Candidate_2", by.y="Candidate")
}
library(microbenchmark)
# benchmark results
microbenchmark(
doit(df_small, df_big) ,
doit_merge(df_small, df_big)
)
RESULTS
Unit: microseconds
expr min lq mean median uq max neval cld
doit(df_small, df_big) 676.570 758.472 1077.203 834.0115 978.9315 4591.473 100 a
doit_merge(df_small, df_big) 1329.327 1449.205 1986.995 1612.3940 2021.9070 5966.780 100 b
Upvotes: 0
Reputation: 522626
You need to join twice, once for each of the two candidates' status:
df_result <- merge(x=df_small, y=df_big, by.x="Candidate_1", by.y="Candidate")
df_result <- merge(x=df_result, y=df_big, by.x="Candidate_2", by.y="Candidate")
Upvotes: 2