Reputation: 127
I have two data.frames:
data.frame1:
CustID FirstName LastName Address DOB City Phone
132 Mary K 999 Drive 1/1/2011 Chicago 888-0000
133 Mona J 222 Road 1/4/2002 NY 999-8888
188 Jack S 122 Street 9/2/2009 Washin 777-9999
None Helen L 111 Rd 1/4/2010
None John M 888 Lane 4/2/2002
None Sally K 222 Street 2/3/2002
data.frame2
CustID FirstName LastName Address DOB City
132 Mary K 999 Drive 1/1/2011 Chicago
133 Mona J 222 Road 1/4/2002 NY
188 Jack S 122 Street 9/2/2009 Washington
3338 Helen L 111 Rd 1/4/2010
882 John M 888 Lane 4/2/2002
976 Sally K 222 Street 2/3/2002
Data.frame1 contains None in CustID column. I need to replace these Nones with a CustID from data.frame2 and make sure that columns FirstName, LastName, Address, DOB match from both data sets, because some names can match from both data sets but have different address and DOB - these are not the same people. I have converted these columns into character from factor (not sure if it matters), and applied match() function but received 0 matches (which i know is wrong) this is my code:
data.frame1$ID[match(c(data.framr2$'FirstName',
data.frame2$'LastName',
data.frame2$'DOB',
data.frame2$'Address'),
c(data.frame1$'FirstName',
data.frame1$'LastName',
data.frame1$'DOB',
data.frame1$'Address'))]
Upvotes: 0
Views: 70
Reputation: 764
Here is one way using dplyr
.
library(dplyr)
df1 <- read.table(text =
"CustID FirstName LastName Address DOB City Phone
132 Mary K 999Drive 1/1/2011 Chicago 888-0000
133 Mona J 222Road 1/4/2002 NY 999-8888
188 Jack S 122Street 9/2/2009 Washin 777-9999
None Helen L 111Rd 1/4/2010 '' ''
None John M 888Lane 4/2/2002 '' ''
None Sally K 222Street 2/3/2002 '' ''"
, header = T, stringsAsFactors = F)
df2 <- read.table(text=
"CustID FirstName LastName Address DOB City
132 Mary K 999Drive 1/1/2011 Chicago
133 Mona J 222Road 1/4/2002 NY
188 Jack S 122Street 9/2/2009 Washington
3338 Helen L 111Rd 1/4/2010 ''
882 John M 888Lane 4/2/2002 ''
976 Sally K 222Street 2/3/2002 ''"
, header = T, stringsAsFactors = F)
df1 %>% left_join(df2 %>% select(-City), by = c('FirstName', 'LastName', 'DOB', 'Address')) %>%
mutate(CustID = ifelse(CustID.y == "None", CustID.x, CustID.y)) %>% select(-CustID.x, -CustID.y)
FirstName LastName Address DOB City Phone CustID
1 Mary K 999Drive 1/1/2011 Chicago 888-0000 132
2 Mona J 222Road 1/4/2002 NY 999-8888 133
3 Jack S 122Street 9/2/2009 Washin 777-9999 188
4 Helen L 111Rd 1/4/2010 3338
5 John M 888Lane 4/2/2002 882
6 Sally K 222Street 2/3/2002 976
Upvotes: 1
Reputation: 121
This code should illustrate how you have to proceed:
Example
df1 <- data.frame(id=c(NA, 12, NA, 13),
fname=c("A","B","Z","D"),
lname=c("1","2","3","4"))
df2 <- data.frame(id=c(1, 21, 33, 44),
fname=c("Z","A","A","Z") ,
lname=c("1","3","1","3"))
df1[!complete.cases(df1),1] <- merge(
x=df1[!complete.cases(df1[,"id"]),],
y=df2,
by=c("fname", "lname"))[,"id.y"]
Upvotes: 1