MHernandez22
MHernandez22

Reputation: 89

How to identify the most repeated character in a string from a given string in R

I have this problem that I don't have the smallest idea on how to approach. Imagine that you have the following string "aabccccdeddaaa". The program needs to return the most repeated consecutive character, and how many times it's repeated, one might think that it's "a" because it repeats 5 times in the string, but that's not what I'm looking for. The correct answer for my problem is "c" because even though it just repeats 4 times, it does repeat those 4 times consecutively, while "a" repeats only 3 times consecutively.

Not looking for the solution though, only for some guidance on how to start.

Upvotes: 2

Views: 153

Answers (2)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521028

One approach would be to split the input string at any point where the previous and following letters do not agree. Then, sort the resulting vector of parts descending to find the letter/term which appeared the most:

x <- "aabccccdeddaaa"
parts <- strsplit(x, "(?<=(.))(?!\\1)", perl=TRUE)[[1]]
parts[order(-nchar(parts), parts)][1]

[1] "cccc"

For reference, here is the vector of terms:

parts
[1] "aa"   "b"    "cccc" "d"    "e"    "dd"   "aaa"

Upvotes: 2

hmhensen
hmhensen

Reputation: 3195

Just went ahead and did it. You need to use a combination of a few functions. The main one is rle. It counts consecutive values. The rest is just putting together some basic functions to extract the elements of rle you need.

# Which letter is repeated most
rle(unlist(strsplit("aabccccdeddaaa", "")))$values[which.max(rle(unlist(strsplit("aabccccdeddaaa", "")))$lengths)]
[1] "c"

# How many times it's repeated
max(rle(unlist(strsplit("aabccccdeddaaa", "")))$lengths)
[1] 4

Upvotes: 3

Related Questions