Reputation: 21
I have two dataframes as following
DF1:
ncusip ticker YEAR
04621X10 AIZ 2008
NA CCI 2017
NA AEUXX 2007
71343P20 PAS 2008
71383910 PCR 2010
83409F20 SOI 2017
and the second is
DF2:
NUSIP Ticker YEAR ES
AEUXX 2007 0.5
0 MCRL 2010 0.15
#N/A CCI 2017 0.125
04621X10 AIZ 2008 0.5
04650Y10 HOME 2018 0.375
I would like to match DF1 with DF2 using c(ncusip, CUSIP) firstly and then using c(ticker, Ticker) if if there is no cusip in that year, to obtain corresponding "ES" value and get the following results.
DF1:
ncusip ticker YEAR ES
04621X10 AIZ 2008 0.5
NA CCI 2017 0.125
NA AEUXX 2007 0.5
71343P20 PAS 2008 NA
71383910 PCR 2010 NA
83409F20 SOI 2017 NA
I tried
match.1 <- match(paste(df1$ncusip, df1$YEAR), paste(df2$CUSIP, df2$YEAR))
df1$ES <- df2$ES[match.1]
match.2 <- match(paste(df1$tikcer[df2$CUSIP %in% c("0", "", "#N/A")], df1$YEAR), paste(df2$Ticker[df2$CUSIP %in% c("0", "", "#N/A")], df2$YEAR))
df1$ES <- df2$ES[match.2]
df1 <- df1 %>%
mutate(ES = ifelse(is.na(ncusip)&is.na(ticker), NA, ES))
But I found that the results of ES appear errors or say chaos. I would appreciate if someone could help me sort this issue!
Upvotes: 1
Views: 74
Reputation: 19088
A solution using merge
. I "cleaned" df2
by replacing
and #N/A
with NA
.
merge(df1, df2[,-3], by.x=c("ncusip","ticker"),
by.y=c("NUSIP","Ticker"), all.x=T)
ncusip ticker YEAR ES
1 04621X10 AIZ 2008 0.500
2 71343P20 PAS 2008 NA
3 71383910 PCR 2010 NA
4 83409F20 SOI 2017 NA
5 <NA> AEUXX 2007 0.500
6 <NA> CCI 2017 0.125
df1 <- structure(list(ncusip = c("04621X10", NA, NA, "71343P20", "71383910",
"83409F20"), ticker = c("AIZ", "CCI", "AEUXX", "PAS", "PCR",
"SOI"), YEAR = c(2008L, 2017L, 2007L, 2008L, 2010L, 2017L)), class = "data.frame", row.names = c(NA,
-6L))
df2 <- structure(list(NUSIP = c(NA, "0", NA, "04621X10", "04650Y10"),
Ticker = c("AEUXX", "MCRL", "CCI", "AIZ", "HOME"), YEAR = c(2007L,
2010L, 2017L, 2008L, 2018L), ES = c(0.5, 0.15, 0.125, 0.5,
0.375)), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 1