Reputation: 525
I have a string of text and a vector of words:
String: "Auch ein blindes Huhn findet einmal ein Korn."
Vector: "auch", "ein"
I want to check how often each word in the vector is contained in the string and calculate the sum of the frequencies. For the example, the correct result would be 3.
I have come so far as to be able to check which words occur in the string and calculate the sum:
library(stringr)
deu <- c("\\bauch\\b", "\\bein\\b")
str_detect(tolower("Auch ein blindes Huhn findet einmal ein Korn."), deu)
[1] TRUE TRUE
sum(str_detect(tolower("Auch ein blindes Huhn findet einmal ein Korn."), deu))
[1] 2
Unfortunately str_detect
does not return the number of occurences (1, 2
), but only whether a word occurs in a string (TRUE, TRUE
), so the sum of the output from str_detect
is not equal to the number of words.
Is there a function in R similar to preg_match_all
in PHP?
preg_match_all("/\bauch\b|\bein\b/i", "Auch ein blindes Huhn findet einmal ein Korn.", $matches);
print_r($matches);
Array
(
[0] => Array
(
[0] => Auch
[1] => ein
[2] => ein
)
)
echo preg_match_all("/\bauch\b|\bein\b/i", "Auch ein blindes Huhn findet einmal ein Korn.", $matches);
3
I would like to avoid loops.
I have looked at a lot of similar questions, but they either don't count the number of occurrences or do not use a vector of patterns to search. I may have overlooked a question that answers mine, but before you mark this as duplicate, please make sure that the "duplicate" actually asks the exact same thing. Thank you.
Upvotes: 3
Views: 83
Reputation: 7979
Character String Processing
If base R is too complex in its syntax, I would go with {stringi}
stringi::stri_count_regex(tolower(String), sprintf('\\b%s\\b', Vector)) |>
setNames(Vector) # optional
auch ein
1 2
Data
String = 'Auch ein blindes Huhn findet einmal ein Korn.'
Vector = c('auch', 'ein')
Upvotes: 2
Reputation: 102529
Given string and pattern like below
s <- "Auch ein blindes Huhn findet einmal ein Korn."
p <- c("auch", "ein")
you can try strsplit
+ %in%
:
> sum(gsub("\\W", "", strsplit(tolower(s), " ")[[1]]) %in% p)
[1] 3
table
if you would like to see the summary of counts)> table(gsub("\\W", "", strsplit(tolower(s), " ")[[1]]))[p]
auch ein
1 2
Upvotes: 2
Reputation: 4147
You can use str_count
like
stringr::str_count(tolower("Auch ein blindes Huhn findet mal ein Korn"), paste0("\\b", tolower(c("ein","Huhn")), "\\b"))
[1] 2 1
Upvotes: 5
Reputation: 73562
You could sprintf
a pattern by adding \\b
for borders and use lengths
on gregexpr
.
> vp <- v |> sprintf(fmt='\\b%s\\b') |> setNames(v) |> print()
auch ein
"\\bauch\\b" "\\bein\\b"
> lapply(vp, gregexpr, text=tolower(string)) |> unlist(recursive=FALSE) |> lengths()
auch ein
1 2
The |> print()
is just for simultaneously assigning and printing and can be removed.
Data:
string <- "Auch ein blindes Huhn findet einmal ein Korn."
v <- c("auch", "ein")
Upvotes: 3