coding_is_fun
coding_is_fun

Reputation: 127

Merge data frames and replace NONE with values in R

I have two data.frames:

data.frame1:
CustID  FirstName   LastName    Address         DOB         City    Phone
132    Mary         K               999 Drive   1/1/2011    Chicago 888-0000
133    Mona         J               222 Road    1/4/2002    NY      999-8888
188    Jack         S               122 Street  9/2/2009    Washin  777-9999
None    Helen       L               111 Rd      1/4/2010        
None    John        M               888 Lane    4/2/2002        
None    Sally       K               222 Street  2/3/2002        


data.frame2                     
CustID FirstName LastName Address   DOB         City
132    Mary      K        999 Drive 1/1/2011    Chicago 
133    Mona      J         222 Road 1/4/2002    NY  
188    Jack      S      122 Street  9/2/2009    Washington  
3338    Helen   L         111 Rd    1/4/2010        
882     John    M       888 Lane    4/2/2002        
976    Sally    K     222 Street    2/3/2002    

Data.frame1 contains None in CustID column. I need to replace these Nones with a CustID from data.frame2 and make sure that columns FirstName, LastName, Address, DOB match from both data sets, because some names can match from both data sets but have different address and DOB - these are not the same people. I have converted these columns into character from factor (not sure if it matters), and applied match() function but received 0 matches (which i know is wrong) this is my code:

data.frame1$ID[match(c(data.framr2$'FirstName',
                     data.frame2$'LastName',
                     data.frame2$'DOB',
                     data.frame2$'Address'), 
                     c(data.frame1$'FirstName',
                     data.frame1$'LastName',
                     data.frame1$'DOB',
                     data.frame1$'Address'))]   

Upvotes: 0

Views: 70

Answers (2)

TBT8
TBT8

Reputation: 764

Here is one way using dplyr.

  library(dplyr)

  df1 <- read.table(text = 
       "CustID  FirstName   LastName    Address         DOB         City    Phone
  132    Mary         K               999Drive   1/1/2011    Chicago 888-0000
  133    Mona         J               222Road    1/4/2002    NY      999-8888
  188    Jack         S               122Street  9/2/2009    Washin  777-9999
  None    Helen       L               111Rd      1/4/2010     ''     ''
  None    John        M               888Lane    4/2/2002       ''   ''
  None    Sally       K               222Street  2/3/2002        ''  ''"
  , header = T, stringsAsFactors = F)


  df2 <- read.table(text=                    
  "CustID FirstName LastName Address   DOB         City
  132    Mary      K        999Drive 1/1/2011    Chicago 
  133    Mona      J         222Road 1/4/2002    NY  
  188    Jack      S      122Street  9/2/2009    Washington  
  3338    Helen   L         111Rd    1/4/2010     ''   
  882     John    M       888Lane    4/2/2002       '' 
  976    Sally    K     222Street    2/3/2002    ''"
  , header = T, stringsAsFactors = F)

  df1 %>% left_join(df2 %>% select(-City), by = c('FirstName', 'LastName', 'DOB', 'Address')) %>% 
       mutate(CustID = ifelse(CustID.y == "None", CustID.x, CustID.y)) %>% select(-CustID.x, -CustID.y)



        FirstName LastName   Address      DOB    City    Phone CustID
1      Mary        K  999Drive 1/1/2011 Chicago 888-0000    132
2      Mona        J   222Road 1/4/2002      NY 999-8888    133
3      Jack        S 122Street 9/2/2009  Washin 777-9999    188
4     Helen        L     111Rd 1/4/2010                    3338
5      John        M   888Lane 4/2/2002                     882
6     Sally        K 222Street 2/3/2002                     976

Upvotes: 1

aivanov
aivanov

Reputation: 121

This code should illustrate how you have to proceed:

  • merge the data.frames by "fname" and "lname" (consider only the rows where id is missing)
  • select the "id" column of the merged data.frame and copy it to df1

Example

df1 <- data.frame(id=c(NA, 12, NA, 13), 
    fname=c("A","B","Z","D"), 
    lname=c("1","2","3","4"))

df2 <- data.frame(id=c(1, 21, 33, 44), 
    fname=c("Z","A","A","Z")  , 
    lname=c("1","3","1","3"))

df1[!complete.cases(df1),1] <- merge(
    x=df1[!complete.cases(df1[,"id"]),], 
    y=df2, 
    by=c("fname", "lname"))[,"id.y"]

Upvotes: 1

Related Questions