Reputation: 1792
I have some two dataframes, df1 and df2, that take a format like this (reproducible code for data entry at the end):
df1
#> name instrument
#> 1 John Lennon guitar
#> 2 Mick Jagger vocals
#> 3 Ringo Starr drums
#> 4 Keith Richards guitar
and
df2
#> name beatles
#> 1 John Lennon 1
#> 2 Ringo Starr 1
#> 3 George Harrison 1
#> 4 Paul McCartney 1
I'd like to add a column to df1 that indicates whether a record also in df2 (and equal to 0 if not), so my desired output is:
output
#> name instrument beatles
#> 1 John Lennon guitar 1
#> 2 Mick Jagger vocals 0
#> 3 Ringo Starr drums 1
#> 4 Keith Richards guitar 0
I've tried full_join()
, but that pulls across rows in df2 that aren't in df1, which is not what I want. (i.e. The George Harrison and Paul McCartney rows should not be in the output)
library(tidyverse)
df1 %>%
full_join(df2)
#> Joining, by = "name"
#> name instrument beatles
#> 1 John Lennon guitar 1
#> 2 Mick Jagger vocals NA
#> 3 Ringo Starr drums 1
#> 4 Keith Richards guitar NA
#> 5 George Harrison <NA> 1
#> 6 Paul McCartney <NA> 1
Reproducible code below:
df1 <- data.frame(stringsAsFactors=FALSE,
name = c("John Lennon", "Mick Jagger", "Ringo Starr", "Keith Richards"),
instrument = c("guitar", "vocals", "drums", "guitar")
)
df2 <- data.frame(stringsAsFactors=FALSE,
name = c("John Lennon", "Ringo Starr", "George Harrison",
"Paul McCartney"),
beatles = c(1, 1, 1, 1)
)
library(tidyverse)
df1 %>%
full_join(df2)
Upvotes: 3
Views: 107
Reputation: 47330
We can use {powerjoin} and use the fill
parameter to fill unmatched values with 0
.
library(powerjoin)
power_left_join(df1, df2, beatles , by="name", fill = 0)
#> name instrument beatles
#> 1 John Lennon guitar 1
#> 2 Mick Jagger vocals 0
#> 3 Ringo Starr drums 1
#> 4 Keith Richards guitar 0
Upvotes: 1
Reputation: 6496
A data.table solution for this left join would be:
df1[df2, on = "name", nomatch = 0]
Upvotes: 3
Reputation: 141
Left join then mutate to replace the NAs with 0s
> library(dplyr)
> output <- df1 %>% left_join(df2, "name") %>% mutate(beatles=ifelse(!is.na(beatles),1,0))
> output
name instrument beatles
1 John Lennon guitar 1
2 Mick Jagger vocals 0
3 Ringo Starr drums 1
4 Keith Richards guitar 0
Upvotes: 2
Reputation: 389135
We can also match
the name
of the two dataframe and specify nomatch
argument to 0. Assign 1 if the value is greater than 0. The +
argument in the front converts logical value to integer.
df1$beatles <- +(match(df1$name, df2$name, nomatch = 0) > 0)
df1
# name instrument beatles
#1 John Lennon guitar 1
#2 Mick Jagger vocals 0
#3 Ringo Starr drums 1
#4 Keith Richards guitar 0
Upvotes: 1
Reputation: 1666
Another solution:
library(magrittr)
df1 <- data.frame(stringsAsFactors = FALSE,
name = c("John Lennon", "Mick Jagger", "Ringo Starr", "Keith Richards"),
instrument = c("guitar", "vocals", "drums", "guitar"))
df2 <- data.frame(stringsAsFactors = FALSE,
name = c("John Lennon", "Ringo Starr", "George Harrison",
"Paul McCartney"),
beatles = c(1, 1, 1, 1))
df1 %>%
dplyr::left_join(y = df2,
by = "name") %>%
tidyr::replace_na(replace = list(beatles = 0))
#> name instrument beatles
#> 1 John Lennon guitar 1
#> 2 Mick Jagger vocals 0
#> 3 Ringo Starr drums 1
#> 4 Keith Richards guitar 0
Upvotes: 2
Reputation: 6132
df1$beatles <- ifelse(df1$name %in% df2$name, 1, 0)
name instrument beatles
1 John Lennon guitar 1
2 Mick Jagger vocals 0
3 Ringo Starr drums 1
4 Keith Richards guitar 0
Or if you have other bands in df2 as well, you could do:
df1$beatles <- ifelse(df1$name %in% df2[df2$beatles == 1,]$name, 1, 0)
Upvotes: 4