Reputation: 419
I have 2 data frames df_1 and df_2. They share 3 cols in common: permno, cusip, and ticker. Each row of df_1 is a unique stock. The permno, cusip, and ticker in df_1 are used to identify the stock return in df_2. Sometimes one or two of these these variables are not available, but in each row at least one of the three is available. And I will use that value to look up the return in df_2.
Can you suggest any (fast) way to merge df_1 and df_2 if there is a match in at least one of three columns permno, cusip, or ticker.
df_1
id permno cusip ticker
1 1 11 AA
2 NA 12 NA
3 2 13 NA
4 5 NA NA
df_2
permno cusip ticker return date
1 11 NA 100 date_1
7 15 BX 102 date_2
2 NA CU 103 date_3
Desired result
id permno cusip ticker return date
1 1 11 AA 100 date_1
1 1 11 NA 100 date_1
3 2 13 NA 103 date_3
3 2 NA CU 103 date_3
Upvotes: 2
Views: 1188
Reputation: 4686
This should work.
# define common columns in both data frames
colmatch <- c("permno", "cusip", "ticker")
# function to trim down data frame A to just those with rows
# that have at least one match in common column with data frame B
# and append columns from B which are not found in A
simplify <- function(df1, df2, col = colmatch) {
# find all common column elements that matches
idx <- sapply(col, function(x)
match(df1[[x]], df2[[x]], incomparables=NA)
)
# find rows in first data frame with at least one match
idx1 <- which(apply(idx, 1, function(x) !all(is.na(x))))
# find corresponding rows in second data frame
idx2 <- apply(idx[idx1, ], 1, function(x) x[min(which(!is.na(x)))])
# copy columns from second data frame to first data frame
# only for rows which matches above
dff <- cbind(df1[idx1, ], df2[idx2, !(names(df2) %in% colmatch), drop=F])
}
# assemble the final output
df_final <- rbind(simplify(df_1, df_2), # find df_1 rows with matches in df_2
simplify(df_2, df_1)) # and vice versa
The final output (if you prefer it sorted by id
)
> df_final[order(df_final$id), ]
id permno cusip ticker return date
1 1 1 11 AA 100 date_1
11 1 1 11 <NA> 100 date_1
3 3 2 13 <NA> 103 date_3
31 3 2 NA CU 103 date_3
Upvotes: 1