user3845968
user3845968

Reputation: 113

counting number of appearances of the given value

I have a string and need to count number of appearances of a given value which must appear consequent. I tried to take help from stringr package but it counts every time it finds that value/pattern. For example, say we have to count appearance of "213" in string "2132132132137889213", then the output i need is 4 however, i am getting 5 after using stringr_count function. Please help.

Upvotes: 1

Views: 436

Answers (2)

akrun
akrun

Reputation: 887391

Another way would be:

 fun1 <- function(pat, text) {
max_rep_pat1 <- function(pat, text) {
    text1 <- gsub(pat, paste(" ", pat, " "), text)
    rl <- rle(scan(text = text1, what = "", quiet = T) == pat)
    max(rl$lengths[rl$values])
 }
setNames(mapply(max_rep_pat1, pat, text), NULL)

}

str1 <- c("2132132132137889213", "21321321321378892132132132132132213213")
str2 <- "213421342134213477"
fun1("2134", str2)
#[1] 4
fun1("213", str1)
#[1] 4 5

Upvotes: 2

alexis_laz
alexis_laz

Reputation: 13122

I'm not sure of my "regex" skills but, hopefully, you could make something out of this:

max_rep_pat = function(pat, text)
{
   res = gregexpr(paste0("(", pat, ")+"), text)
   sapply(res, function(x) max(attr(x, "match.length")) / nchar(pat))
}
max_rep_pat("213", c("2132132132137889213", 
                     "21321321321378892132132132132132213213"))
#[1] 4 5

gregexpr returns the position a pattern occured and the number of characters of the found pattern. Wrapping the pattern in "(pattern)+" means 'find the repetitive pattern'. Compare the following two:

gregexpr("213", "2132132132137889213") 
[[1]]
[1]  1  4  7 10 17
attr(,"match.length")
[1] 3 3 3 3 3
#attr(,"useBytes")
#[1] TRUE

gregexpr("(213)+", "2132132132137889213") 
[[1]]
[1]  1 17
attr(,"match.length")
[1] 12  3
#attr(,"useBytes")
#[1] TRUE

In the first case, it found the position of each "213" and the length of each match is just the nchar of pattern. In the second case, it found every repetitive pattern of "213" and we see that repetitions of "213" occured two times; first time with 12 / 3 = 4 repetitions and the second with 3 / 3 = 1 repetition. Using max(attr(x, "match.length")) / nchar(pattern) we get that 4.

Upvotes: 3

Related Questions