Reputation: 1792

Create a dummy variable column that indicates if a record is in a second dataframe?

I have some two dataframes, df1 and df2, that take a format like this (reproducible code for data entry at the end):

df1
#>             name instrument
#> 1    John Lennon     guitar
#> 2    Mick Jagger     vocals
#> 3    Ringo Starr      drums
#> 4 Keith Richards     guitar

and

df2
#>              name beatles
#> 1     John Lennon       1
#> 2     Ringo Starr       1
#> 3 George Harrison       1
#> 4  Paul McCartney       1

I'd like to add a column to df1 that indicates whether a record also in df2 (and equal to 0 if not), so my desired output is:

output
#>             name instrument beatles
#> 1    John Lennon     guitar       1
#> 2    Mick Jagger     vocals       0
#> 3    Ringo Starr      drums       1
#> 4 Keith Richards     guitar       0

I've tried full_join(), but that pulls across rows in df2 that aren't in df1, which is not what I want. (i.e. The George Harrison and Paul McCartney rows should not be in the output)


library(tidyverse)

df1 %>% 
  full_join(df2) 

#> Joining, by = "name"
#>              name instrument beatles
#> 1     John Lennon     guitar       1
#> 2     Mick Jagger     vocals      NA
#> 3     Ringo Starr      drums       1
#> 4  Keith Richards     guitar      NA
#> 5 George Harrison       <NA>       1
#> 6  Paul McCartney       <NA>       1

Reproducible code below:

df1 <- data.frame(stringsAsFactors=FALSE,
         name = c("John Lennon", "Mick Jagger", "Ringo Starr", "Keith Richards"),
   instrument = c("guitar", "vocals", "drums", "guitar")
)

df2 <- data.frame(stringsAsFactors=FALSE,
                name = c("John Lennon", "Ringo Starr", "George Harrison",
                         "Paul McCartney"),
          beatles = c(1, 1, 1, 1)
       )

library(tidyverse)

df1 %>% 
  full_join(df2)

Upvotes: 3

Answers (6)

moodymudskipper

Reputation: 47330

We can use {powerjoin} and use the fill parameter to fill unmatched values with 0.

library(powerjoin)
power_left_join(df1, df2, beatles , by="name", fill = 0)
#>             name instrument beatles
#> 1    John Lennon     guitar       1
#> 2    Mick Jagger     vocals       0
#> 3    Ringo Starr      drums       1
#> 4 Keith Richards     guitar       0

Upvotes: 1

PavoDive

Reputation: 6496

A data.table solution for this left join would be:

df1[df2, on = "name", nomatch = 0]

Upvotes: 3

Ben Wynne-Morris

Reputation: 141

Left join then mutate to replace the NAs with 0s

> library(dplyr)

> output <- df1 %>% left_join(df2, "name") %>% mutate(beatles=ifelse(!is.na(beatles),1,0))
> output
            name instrument beatles
1    John Lennon     guitar       1
2    Mick Jagger     vocals       0
3    Ringo Starr      drums       1
4 Keith Richards     guitar       0

Upvotes: 2

Ronak Shah

Reputation: 389135

We can also match the name of the two dataframe and specify nomatch argument to 0. Assign 1 if the value is greater than 0. The + argument in the front converts logical value to integer.

df1$beatles <- +(match(df1$name, df2$name, nomatch = 0) > 0)
df1

#            name instrument beatles
#1    John Lennon     guitar       1
#2    Mick Jagger     vocals       0
#3    Ringo Starr      drums       1
#4 Keith Richards     guitar       0

Upvotes: 1

yarnabrina

Reputation: 1666

Another solution:

library(magrittr)

df1 <- data.frame(stringsAsFactors = FALSE,
                  name = c("John Lennon", "Mick Jagger", "Ringo Starr", "Keith Richards"),
                  instrument = c("guitar", "vocals", "drums", "guitar"))

df2 <- data.frame(stringsAsFactors = FALSE,
                  name = c("John Lennon", "Ringo Starr", "George Harrison",
                           "Paul McCartney"),
                  beatles = c(1, 1, 1, 1))

df1 %>%
  dplyr::left_join(y = df2,
                   by = "name") %>%
  tidyr::replace_na(replace = list(beatles = 0))
#>             name instrument beatles
#> 1    John Lennon     guitar       1
#> 2    Mick Jagger     vocals       0
#> 3    Ringo Starr      drums       1
#> 4 Keith Richards     guitar       0

Upvotes: 2

Lennyy

Reputation: 6132

df1$beatles <- ifelse(df1$name %in% df2$name, 1, 0)

            name instrument beatles
1    John Lennon     guitar       1
2    Mick Jagger     vocals       0
3    Ringo Starr      drums       1
4 Keith Richards     guitar       0

Or if you have other bands in df2 as well, you could do:

df1$beatles <- ifelse(df1$name %in% df2[df2$beatles == 1,]$name, 1, 0)

Upvotes: 4

Create a dummy variable column that indicates if a record is in a second dataframe?

Answers (6)

Related Questions