Reputation: 416
I would like to count multiple pattern matches in a dataframe column containing long strings.
pattern<-c("AAA", "BBB", "CCC")
df$AAA <- str_count(df$string_1, "AAA+")
df$BBB <- str_count(df$string_1, "BBB+")
df$CCC <- str_count(df$string_1, "CCC+")
df$AAA <- str_count(df$string_2, "AAA+")
df$BBB <- str_count(df$string_2, "BBB+")
df$CCC <- str_count(df$string_2, "CCC+")
...
In reality the list "pattern" is much longer, so need to use loop over the column with strings.
Upvotes: 1
Views: 651
Reputation: 32548
You could loop with lapply
or sapply
#DATA
pattern<-c("AAA", "BBB", "CCC")
set.seed(42)
df = data.frame(a = replicate(5, paste(sample(c("A", "B", "C"), 50, TRUE), collapse = "")),
b = replicate(5, paste(sample(c("A", "B", "C"), 50, TRUE), collapse = "")))
library(stringr)
setNames(lapply(pattern, function(x) sapply(df, function(y)
str_count(string = y, pattern = x))), pattern)
#$AAA
# a b
#[1,] 0 0
#[2,] 2 1
#[3,] 0 2
#[4,] 4 1
#[5,] 2 2
#$BBB
# a b
#[1,] 1 2
#[2,] 1 0
#[3,] 2 3
#[4,] 1 2
#[5,] 2 1
#$CCC
# a b
#[1,] 1 0
#[2,] 2 1
#[3,] 2 0
#[4,] 2 0
#[5,] 0 1
Upvotes: 3