Reputation: 113
I have a string and need to count number of appearances of a given value which must appear consequent. I tried to take help from stringr package but it counts every time it finds that value/pattern. For example, say we have to count appearance of "213" in string "2132132132137889213", then the output i need is 4 however, i am getting 5 after using stringr_count function. Please help.
Upvotes: 1
Views: 436
Reputation: 887391
Another way would be:
fun1 <- function(pat, text) {
max_rep_pat1 <- function(pat, text) {
text1 <- gsub(pat, paste(" ", pat, " "), text)
rl <- rle(scan(text = text1, what = "", quiet = T) == pat)
max(rl$lengths[rl$values])
}
setNames(mapply(max_rep_pat1, pat, text), NULL)
}
str1 <- c("2132132132137889213", "21321321321378892132132132132132213213")
str2 <- "213421342134213477"
fun1("2134", str2)
#[1] 4
fun1("213", str1)
#[1] 4 5
Upvotes: 2
Reputation: 13122
I'm not sure of my "regex" skills but, hopefully, you could make something out of this:
max_rep_pat = function(pat, text)
{
res = gregexpr(paste0("(", pat, ")+"), text)
sapply(res, function(x) max(attr(x, "match.length")) / nchar(pat))
}
max_rep_pat("213", c("2132132132137889213",
"21321321321378892132132132132132213213"))
#[1] 4 5
gregexpr
returns the position a pattern occured and the number of characters of the found pattern. Wrapping the pattern in "(pattern)+" means 'find the repetitive pattern'. Compare the following two:
gregexpr("213", "2132132132137889213")
[[1]]
[1] 1 4 7 10 17
attr(,"match.length")
[1] 3 3 3 3 3
#attr(,"useBytes")
#[1] TRUE
gregexpr("(213)+", "2132132132137889213")
[[1]]
[1] 1 17
attr(,"match.length")
[1] 12 3
#attr(,"useBytes")
#[1] TRUE
In the first case, it found the position of each "213" and the length of each match is just the nchar
of pattern. In the second case, it found every repetitive pattern of "213" and we see that repetitions of "213" occured two times; first time with 12 / 3 = 4 repetitions and the second with 3 / 3 = 1 repetition. Using max(attr(x, "match.length")) / nchar(pattern)
we get that 4.
Upvotes: 3