user2904120
user2904120

Reputation: 416

Count number of multiple pattern matches in a string

I would like to count multiple pattern matches in a dataframe column containing long strings.

pattern<-c("AAA", "BBB", "CCC")

df$AAA <- str_count(df$string_1, "AAA+")
df$BBB <- str_count(df$string_1, "BBB+")
df$CCC <- str_count(df$string_1, "CCC+")
df$AAA <- str_count(df$string_2, "AAA+")
df$BBB <- str_count(df$string_2, "BBB+")
df$CCC <- str_count(df$string_2, "CCC+")
...

In reality the list "pattern" is much longer, so need to use loop over the column with strings.

Upvotes: 1

Views: 651

Answers (1)

d.b
d.b

Reputation: 32548

You could loop with lapply or sapply

#DATA
pattern<-c("AAA", "BBB", "CCC")
set.seed(42)
df = data.frame(a = replicate(5, paste(sample(c("A", "B", "C"), 50, TRUE), collapse = "")),
                b = replicate(5, paste(sample(c("A", "B", "C"), 50, TRUE), collapse = "")))

library(stringr)
setNames(lapply(pattern, function(x) sapply(df, function(y)
                              str_count(string = y, pattern = x))), pattern)
#$AAA
#     a b
#[1,] 0 0
#[2,] 2 1
#[3,] 0 2
#[4,] 4 1
#[5,] 2 2

#$BBB
#     a b
#[1,] 1 2
#[2,] 1 0
#[3,] 2 3
#[4,] 1 2
#[5,] 2 1

#$CCC
#     a b
#[1,] 1 0
#[2,] 2 1
#[3,] 2 0
#[4,] 2 0
#[5,] 0 1

Upvotes: 3

Related Questions