Reputation: 61
My two dataframes:
df1
Col1
A
B
C
df2
Col1
A
D
E
F
I would like to add a 2nd column, Col2, to df1 where each value in the column is 1 if it's respective value in Col1 is also in Col1 of df2. Thus df1 would look like this:
df1
Col1 Col2
A 1
B 0
C 0
Thanks!
Upvotes: 1
Views: 134
Reputation: 6542
Two options using data.table
First one uses %chin%
operator :
library(data.table)
x = data.table(v = LETTERS[1:3])
y = data.table(v = c("A","D","E","F"))
x[, found:= v %chin% y$v]
x
#> v found
#> 1: A TRUE
#> 2: B FALSE
#> 3: C FALSE
The second one is built on merging behaviour:
library(data.table)
x = data.table(v = LETTERS[1:3])
y = data.table(v = c("A","D","E","F"))
y[, found := TRUE]
x[, found:= y[.SD, .(ifelse(is.na(found), FALSE, TRUE)), on = .(v)]]
x
#> v found
#> 1: A TRUE
#> 2: B FALSE
#> 3: C FALSE
EDIT: Based on @frank comment, you could simplify with no ifelse
- it is the same
x[, found:= y[.SD, !is.na(found), on = .(v)]]
x
#> v found
#> 1: A TRUE
#> 2: B FALSE
#> 3: C FALSE
For understanding what happens, here is the inside code I built on:
x[, found := NULL]
y[x, on = .(v)]
#> v found
#> 1: A TRUE
#> 2: B NA
#> 3: C NA
Upvotes: 1
Reputation: 39154
df3
is the final output.
library(dplyr)
df1 <- data_frame(Col1 = c("A", "B", "C"))
df2 <- data_frame(Col1 = c("A", "D", "E", "F"))
df3 <- df1 %>% mutate(Col2 = as.numeric(Col1 %in% df2$Col1))
Or the following approach is similar to HarlandMason's method but using dplyr
and tidyr
.
library(dplyr)
library(tidyr)
df3 <- df2 %>%
mutate(Col2 = 1) %>%
right_join(df1, by = "Col1") %>%
replace_na(list(Col2 = 0))
Upvotes: 1
Reputation: 789
Add the col2 to df2
df2$Col2 <- 1
Perform a left-join merge:
df3 <- merge(df1, df2, all.x=T, by='Col1')
Replace the NAs with zeros
df3$Col2[which(is.na(df3$Col2))] <- 0
df3 is now
Col1 Col2
1 A 1
2 B 0
3 C 0
Edit: @ycw has done it more concisely using as.numeric
and %in%
. I like his answer, but I thought I'd edit mine to include a version of his work that doesn't use dplyr:
It's as simple as df1$Col2 <- as.numeric(df1$Col1 %in% df2$Col1)
. Much better than mine!
Upvotes: 2