Reputation: 1
I'm trying to create a dummy variable based on if df1 is contained within df2. Note that df2 has columns more than just the columns in df1.
e.g.:
df1:
A | B | C |
---|---|---|
1 | 2 | 3 |
4 | 5 | 6 |
7 | 8 | 0 |
df2:
A | B | C | D |
---|---|---|---|
1 | 2 | 3 | E |
4 | 5 | 6 | F |
7 | 8 | 9 | G |
Resulting in: df2:
A | B | C | D | Dummy |
---|---|---|---|---|
1 | 2 | 3 | E | 1 |
4 | 5 | 6 | F | 1 |
7 | 8 | 9 | G | 0 |
Any good approaches I should consider?
I've tried using an ifelse function applied to the dataframe, but I suspect I've coded it wrong. Any tips would be appreciated!
Upvotes: 0
Views: 418
Reputation: 1527
One approach would be to add a column called "dummy" to df1, then join with df2 on all variables of df1.
df1$dummy <- 1
library(dplyr)
dplyr::left_join(df2, df1) %>%
mutate(dummy = ifelse(is.na(dummy), 0, dummy))
# Joining, by = c("A", "B", "C")
# A B C D dummy
# 1 2 3 E 1
# 4 5 6 F 1
# 7 8 9 G 0
By default left_join joins using all commonly named variables, but this can be modified as required.
Upvotes: 1