zoe
zoe

Reputation: 21

R - How to match two dataframes based on multiple conditions?

I have two dataframes as following

DF1:
    ncusip           ticker      YEAR      
    04621X10           AIZ        2008
    NA                 CCI        2017
    NA                 AEUXX      2007
    71343P20           PAS        2008
    71383910           PCR        2010 
    83409F20           SOI        2017

and the second is

DF2:

NUSIP           Ticker      YEAR      ES
                   AEUXX      2007     0.5
0                  MCRL       2010     0.15
#N/A               CCI        2017     0.125
04621X10           AIZ        2008     0.5
04650Y10           HOME       2018     0.375

I would like to match DF1 with DF2 using c(ncusip, CUSIP) firstly and then using c(ticker, Ticker) if if there is no cusip in that year, to obtain corresponding "ES" value and get the following results.

DF1:

ncusip           ticker      YEAR     ES
04621X10           AIZ        2008      0.5
NA                 CCI        2017      0.125
NA                 AEUXX      2007      0.5
71343P20           PAS        2008      NA
71383910           PCR        2010      NA 
83409F20           SOI        2017      NA     

I tried

    match.1 <- match(paste(df1$ncusip, df1$YEAR), paste(df2$CUSIP, df2$YEAR))
    df1$ES <- df2$ES[match.1]
    match.2 <- match(paste(df1$tikcer[df2$CUSIP %in% c("0", "", "#N/A")], df1$YEAR), paste(df2$Ticker[df2$CUSIP %in% c("0", "", "#N/A")], df2$YEAR))
    df1$ES <- df2$ES[match.2]
df1 <- df1 %>%
  mutate(ES = ifelse(is.na(ncusip)&is.na(ticker), NA, ES))

But I found that the results of ES appear errors or say chaos. I would appreciate if someone could help me sort this issue!

Upvotes: 1

Views: 74

Answers (1)

Andre Wildberg
Andre Wildberg

Reputation: 19088

A solution using merge. I "cleaned" df2 by replacing and #N/A with NA.

merge(df1, df2[,-3], by.x=c("ncusip","ticker"), 
                     by.y=c("NUSIP","Ticker"), all.x=T)
    ncusip ticker YEAR    ES
1 04621X10    AIZ 2008 0.500
2 71343P20    PAS 2008    NA
3 71383910    PCR 2010    NA
4 83409F20    SOI 2017    NA
5     <NA>  AEUXX 2007 0.500
6     <NA>    CCI 2017 0.125

Data

df1 <- structure(list(ncusip = c("04621X10", NA, NA, "71343P20", "71383910", 
"83409F20"), ticker = c("AIZ", "CCI", "AEUXX", "PAS", "PCR", 
"SOI"), YEAR = c(2008L, 2017L, 2007L, 2008L, 2010L, 2017L)), class = "data.frame", row.names = c(NA, 
-6L))

df2 <- structure(list(NUSIP = c(NA, "0", NA, "04621X10", "04650Y10"), 
Ticker = c("AEUXX", "MCRL", "CCI", "AIZ", "HOME"), YEAR = c(2007L, 
2010L, 2017L, 2008L, 2018L), ES = c(0.5, 0.15, 0.125, 0.5, 
0.375)), class = "data.frame", row.names = c(NA, -5L))

Upvotes: 1

Related Questions