Reputation: 3
help me, please ) I have 2 dataframes, and I want add to df1 additional column with number of matches in df2 for pattern in "pep" column. help me
df1 <-data.frame("id"=c(1, 2, 3), pep = c("bb", "dr", "ac"))
df2 <- data.frame("name" = c("a", "b", "c", "d", "e", "f"), "word" = c("drab", "drabbed", "drabbler", "dracaena", "drachma", "academia"))
in result I looked for df1
id pep n_matches
1 1 bb 2
2 2 dr 5
3 3 ac 3
Thanks
Upvotes: 0
Views: 198
Reputation: 5798
Base R solution:
transform(
df1,
n_matches = as.integer(
ave(
df1$pep,
df1$pep,
FUN = function(x){
length(
grep(
x,
df2$word
)
)
}
)
)
)
Upvotes: 0
Reputation: 19271
A base R approach
df1$n_matches <- sapply(df1$pep, function(x) length(grep(x,df2$word)))
df1
id pep n_matches
1 1 bb 2
2 2 dr 5
3 3 ac 3
For completeness I include an extended example if you want to also match all occurrences within a word using str_match_all
from library stringr
library(stringr)
# extended example with multiple matches in a word (row 7)
# df2 <- rbind(df2, c("g","drdrbbbb"))
df2
name word
1 a drab
2 b drabbed
3 c drabbler
4 d dracaena
5 e drachma
6 f academia
7 g drdrbbbb
df1$n_matches <- sapply(df1$pep, function(x)
length(unlist(str_match_all(df2$word, x))))
df1
id pep n_matches
1 1 bb 4
2 2 dr 7
3 3 ac 3
Upvotes: 2
Reputation: 75
My dplyr
solution may look a bit ugly compared to normal dplyr
operations:
df1 %>% mutate(n_matches = purr::map(pep, function(x) sum(grepl(x, df2$word))))
or
df1 %>% mutate(n_matches = purr::map(pep, function(x) length(grep(x, df2$word))))
Both returns:
id pep n_matches
1 1 bb 2
2 2 dr 5
3 3 ac 3
Explain:
length(grep("bb", df2$word))
returns the number of matches for bb
in df2$word
. However in dplyr, the input to replace bb
becomes the column vector, so we need the row wise map function map
to let R know that we apply the function to every row/value in the column vector rather than the column vector directly.
Upvotes: 0