Dron_Sol
Dron_Sol

Reputation: 3

number of matches from another dataframe

help me, please ) I have 2 dataframes, and I want add to df1 additional column with number of matches in df2 for pattern in "pep" column. help me

 df1 <-data.frame("id"=c(1, 2, 3), pep = c("bb", "dr", "ac"))
df2 <- data.frame("name" = c("a", "b", "c", "d", "e", "f"), "word" = c("drab", "drabbed", "drabbler", "dracaena", "drachma", "academia"))

in result I looked for df1

   id  pep     n_matches
1  1  bb         2
2  2  dr         5
3  3  ac         3

Thanks

Upvotes: 0

Views: 198

Answers (3)

hello_friend
hello_friend

Reputation: 5798

Base R solution:

transform(
  df1, 
  n_matches = as.integer(
    ave(
      df1$pep, 
      df1$pep,
      FUN = function(x){
        length(
          grep(
            x, 
            df2$word
          )
        )
      }
    )
  )
)

Upvotes: 0

Andre Wildberg
Andre Wildberg

Reputation: 19271

A base R approach

df1$n_matches <- sapply(df1$pep, function(x) length(grep(x,df2$word)))
df1
  id pep n_matches
1  1  bb         2
2  2  dr         5
3  3  ac         3

For completeness I include an extended example if you want to also match all occurrences within a word using str_match_all from library stringr

library(stringr)

# extended example with multiple matches in a word (row 7)
# df2 <- rbind(df2, c("g","drdrbbbb"))
df2
  name     word
1    a     drab
2    b  drabbed
3    c drabbler
4    d dracaena
5    e  drachma
6    f academia
7    g drdrbbbb

df1$n_matches <- sapply(df1$pep, function(x) 
  length(unlist(str_match_all(df2$word, x))))
df1
  id pep n_matches
1  1  bb         4
2  2  dr         7
3  3  ac         3

Upvotes: 2

HarmlessEcon
HarmlessEcon

Reputation: 75

My dplyr solution may look a bit ugly compared to normal dplyr operations:

df1 %>% mutate(n_matches = purr::map(pep, function(x) sum(grepl(x, df2$word))))

or

df1 %>% mutate(n_matches = purr::map(pep, function(x) length(grep(x, df2$word))))

Both returns:

  id pep n_matches
1  1  bb         2
2  2  dr         5
3  3  ac         3

Explain:

length(grep("bb", df2$word)) returns the number of matches for bb in df2$word. However in dplyr, the input to replace bb becomes the column vector, so we need the row wise map function map to let R know that we apply the function to every row/value in the column vector rather than the column vector directly.

Upvotes: 0

Related Questions