user3833190
user3833190

Reputation:

How to join/ merge two tables using character values?

I would like to combine two tables based on first name, last name, and year, and create a new binary variable indicating whether the row from table 1 was present in the 2nd table.

First table is a panel data set of some attributes of NBA players during a season:

   firstname<-c("Michael","Michael","Michael","Magic","Magic","Magic","Larry","Larry")
   lastname<-c("Jordan","Jordan","Jordan","Johnson","Johnson","Johnson","Bird","Bird")
   year<-c("1991","1992","1993","1991","1992","1993","1992","1992")

   season<-data.frame(firstname,lastname,year)


    firstname   lastname        year
  1 Michael      Jordan         1991
  2 Michael      Jordan         1992
  3 Michael      Jordan         1993
  4 Magic        Johnson        1991
  5 Magic        Johnson        1992
  6 Magic        Johnson        1993
  7 Larry        Bird           1992
  8 Larry        Bird           1992

The second data.frame is a panel data set of some attributes of NBA players selected to the All-Star game:

   firstname<-c("Michael","Michael","Michael","Magic","Magic","Magic")
   lastname<-c("Jordan","Jordan","Jordan","Johnson","Johnson","Johnson")
   year<-c("1991","1992","1993","1991","1992","1993")

    ALLSTARS<-data.frame(firstname,lastname,year)



     firstname  lastname    year
  1 Michael     Jordan    1991
  2 Michael     Jordan    1992
  3 Michael     Jordan    1993
  4 Magic       Johnson   1991
  5 Magic       Johnson   1992
  6 Magic       Johnson   1993

My desired result looks like:

  firstname lastname    year    allstars

   1    Michael Jordan  1991    1
   2    Michael Jordan  1992    1
   3    Michael Jordan  1993    1
   4    Magic   Johnson 1991    1
   5    Magic   Johnson 1992    1
   6    Magic   Johnson 1993    1
   7    Larry   Bird    1992    0
   8    Larry   Bird    1992    0

I tried to use a left join. But not sure whether that makes sense:

    test<-join(season, ALLSTARS, by =c("lastname","firstname","year") , type = "left", match = "all")

Upvotes: 6

Views: 1418

Answers (3)

Sam Firke
Sam Firke

Reputation: 23024

It looks like you are using join() from the plyr package. You were almost there: just preface your command with ALLSTARS$allstars <- 1. Then do your join as it is written and finally convert the NA values to 0. So:

ALLSTARS$allstars <- 1
test <- join(season, ALLSTARS, by =c("lastname","firstname","year") , type = "left", match = "all")
test$allstars[is.na(test$allstars)] <- 0

Result:

  firstname lastname year allstars
1   Michael   Jordan 1991        1
2   Michael   Jordan 1992        1
3   Michael   Jordan 1993        1
4     Magic  Johnson 1991        1
5     Magic  Johnson 1992        1
6     Magic  Johnson 1993        1
7     Larry     Bird 1992        0
8     Larry     Bird 1992        0

Though I personally would use left_join or right_join from the dplyr package, as in David's answer, instead of plyr's join(). Also note that you don't actually need the by argument of join() in this case as by default the function will try to join on all fields with common names, which is what you want here.

Upvotes: 1

Pierre L
Pierre L

Reputation: 28451

In base R:

ALLSTARS$allstars <- 1L
newdf <- merge(season, ALLSTARS, by=c('firstname', 'lastname', 'year'), all.x=TRUE)
newdf$allstars[is.na(newdf$allstars)] <- 0L 
newdf

Or one I like for a different approach:

season$allstars <- (apply(season, 1, function(x) paste(x, collapse='')) %in%
apply(ALLSTARS, 1, function(x) paste(x, collapse='')))+0L
# 
#   firstname lastname year allstars
# 1   Michael   Jordan 1991        1
# 2   Michael   Jordan 1992        1
# 3   Michael   Jordan 1993        1
# 4     Magic  Johnson 1991        1
# 5     Magic  Johnson 1992        1
# 6     Magic  Johnson 1993        1
# 7     Larry     Bird 1992        0
# 8     Larry     Bird 1992        0

Upvotes: 2

David Arenburg
David Arenburg

Reputation: 92302

Here's a simple solution using data.table binary join which allows you to update a column by reference while joing

library(data.table)
setkey(setDT(season), firstname, lastname, year)[ALLSTARS, allstars := 1L]
season
#    firstname lastname year allstars
# 1:     Larry     Bird 1992       NA
# 2:     Larry     Bird 1992       NA
# 3:     Magic  Johnson 1991        1
# 4:     Magic  Johnson 1992        1
# 5:     Magic  Johnson 1993        1
# 6:   Michael   Jordan 1991        1
# 7:   Michael   Jordan 1992        1
# 8:   Michael   Jordan 1993        1

Or using dplyr

library(dplyr)
ALLSTARS %>% 
  mutate(allstars = 1L) %>%
  right_join(., season)
#   firstname lastname year allstars
# 1   Michael   Jordan 1991        1
# 2   Michael   Jordan 1992        1
# 3   Michael   Jordan 1993        1
# 4     Magic  Johnson 1991        1
# 5     Magic  Johnson 1992        1
# 6     Magic  Johnson 1993        1
# 7     Larry     Bird 1992       NA
# 8     Larry     Bird 1992       NA

Upvotes: 4

Related Questions