Reputation: 640
I have two data frames, like so:
name <- c("joe", "kim", "kerry", "david")
name2 <- c("kim", "david", "joe", "kerry")
school <- c("cambridge", "south carolina", "vermont binghamton", "delaware")
school2 <- c("south carolina", "delaware", "cambridge magdalene", "vermont")
df1 <- data.frame(name, school)
df2 <- data.frame(name2, school2)
What I would like to do is the following:
df2$name2
for a match in df1$name
.df2$school2
to df1$school
from the matching row.df2$school2
in df1$school
, return FALSE in column df2$perfect.match
So for example, since "joe" in df2
matches "joe" in df1
, there's a match. However, since the values for "school" in both aren't the same, the would be a column in df2
with a value of FALSE in the third row. Same for 4th row in df2
.
I have tried using grep and grepl. I figure grepl would be best, since it returns a logical value. What I tried was:
df2$perfect.match <- ifelse(grepl(paste(df2$name2, collapse = "|"),
df1$name, fixed = F) & grepl(paste(df2$school2, collapse = "|"), df1$school, fixed = F), "", "FALSE")
however, all I get is this:
name2 school2 perfect.match
1 kim south carolina FALSE
2 david delaware
3 joe cambridge magdalene
4 kerry vermont
When my desired result is:
df2
name2 school2 perfect.match
1 kim south carolina
2 david delaware
3 joe cambridge magdalene FALSE
4 kerry vermont FALSE
If possible, something speedy would be best. The real dataframe are quite large. Thanks.
UPDATE:
I would like to also be able to force the rows that are false to have the same value for df2$school
as their corresponding name
match in df1$school
Like so:
name2 school2
1 kim south carolina
2 david delaware
3 joe cambridge
4 kerry vermont binghamton
Upvotes: 1
Views: 479
Reputation: 388862
We can use match
and %in%
. grepl
wouldn't be right here since this is exact matching and not pattern matching.
df2$perfect_match <- df2$school2 %in% df1$school[match(df2$name2, df1$name)]
df2
# name2 school2 perfect_match
#1 kim south carolina TRUE
#2 david delaware TRUE
#3 joe cambridge magdalene FALSE
#4 kerry vermont FALSE
Upvotes: 2
Reputation: 16178
Using dplyr
, you can do:
dfX <- df1 %>%
bind_rows(.,df2) %>%
group_by(name) %>%
distinct(school) %>%
count(name, name = "perfect.matched") %>%
left_join(df2,.,by = 'name') %>%
mutate(., perfect.matched = ifelse(perfect.matched ==1,"","FALSE"))
And to get the following output:
> dfX
name school perfect.matched
1 kim south carolina
2 david delaware
3 joe cambridge magdalene FALSE
4 kerry vermont FALSE
Upvotes: 2
Reputation: 6956
Slightly faster than pasting the columns together:
matches <- df2$name2 %in% df1$name
df2$perfect.match <- df2$school2[matches] %in% df1$school
microbenchmark::microbenchmark(
v1 = {matches <- df2$name2 %in% df1$name
df2$perfect.match <- df2$school2[matches] %in% df1$school
},
v2 = {df2$perfect.match <- paste(df2$name2, df2$school2) %in% paste(df1$name, df1$school)}
)
Upvotes: 2
Reputation: 18425
You can just do...
df2$perfect.match <- paste(df2$name2, df2$school2) %in% paste(df1$name, df1$school)
df2
name2 school2 perfect.match
1 kim south carolina TRUE
2 david delaware TRUE
3 joe cambridge magdalene FALSE
4 kerry vermont FALSE
Upvotes: 3