Reputation: 574
I have two datasets:
Contacts2: This contains a list of ~100,000 contacts, their respective titles and a set of columns which describes the types of work contacts could be involved in. Here's an example dataset:
First<-c("George","Thomas","James","Jimmy","Howard","Herbert")
Last<-c("Washington", "Jefferson", "Madison", "Carter", "Taft", "Hoover")
Title<-c("CEO", "Accountant","Communications Specialist", "President", "Accountant", "CFO")
Finance<-NA
Executive<-NA
Communications<-NA
Contacts2<-as.data.frame(cbind(First,Last,Title,Finance,Executive,Communications))
First Last Title Finance Executive Communications
1 George Washington CEO <NA> <NA> <NA>
2 Thomas Jefferson Accountant <NA> <NA> <NA>
3 James Madison Communications Specialist <NA> <NA> <NA>
4 Jimmy Carter President <NA> <NA> <NA>
5 Howard Taft Accountant <NA> <NA> <NA>
6 Herbert Hoover CFO <NA> <NA> <NA>
Note the last three columns are numeric.
TableOfTitle: This dataset contains a list of ~1,000 unique titles and the same set of columns in which describes the type of work the contacts could be involved in. For each title I've put an 1 in the column(s) of the roles that describe that person's job.
Title<-c("CEO","Accountant", "Communications Specialist", "President", "CFO")
Finance<-c(NA,1,NA,1,1)
Executive<-c(1,NA,NA,NA,1)
Communications<-c(NA,NA,1,NA,NA)
TableOfTitle<-as.data.frame(cbind(Title,Finance,Executive,Communications))
Title Finance Executive Communications
1 CEO <NA> 1 <NA>
2 Accountant 1 <NA> <NA>
3 Communications Specialist <NA> <NA> 1
4 President 1 <NA> <NA>
5 CFO 1 1 <NA>
Note the last three columns are numeric.
I'm now trying to match the check boxes in TableOfTitle
in Contacts2
based on the contact title field. For example, since TableOfTitle
shows anyone with the title of CFO should have an x in the Finance and Executive field, the record for Herbert Hoover in Contacts2
should also have 1s in those columns as well.
Upvotes: 2
Views: 54
Reputation: 2796
Here's a solution that uses dplyr
. It is essentially what some commenters have already recommended, except that this fulfills the request of not copying over any pre-existing data in the last 3 columns of Contacts2
.
Note that ifelse()
can be very slow with large datasets, but for your stated task this shouldn't really be noticeable. Algorithmically, this solution is also a bit clumsy in other ways, but I went for maximum readability here.
Contacts2 <- left_join(Contacts2, TableOfTitle, by = "Title") %>%
transmute(First = First,
Last = Last,
Title = Title,
Finance = ifelse(is.na(Finance.x), Finance.y, Finance.x),
Executive = ifelse(is.na(Executive.x), Executive.y, Executive.x),
Communications = ifelse(is.na(Communications.x), Communications.y, Communications.x))
Example output:
First Last Title Finance Executive Communications
George Washington CEO <NA> 1 <NA>
Thomas Jefferson Accountant 1 <NA> <NA>
James Madison Communications Specialist <NA> <NA> 1
Jimmy Carter President 1 <NA> <NA>
Howard Taft Accountant 1 <NA> <NA>
Herbert Hoover CFO 1 1 <NA>
Upvotes: 1