S M
S M

Reputation: 100

How to replace multiple consecutive occurrences of a character with a maximum allowed number of occurences?

CharSequence content = new StringBuffer("aaabbbccaaa");
String pattern = "([a-zA-Z])\\1\\1+";
String replace = "-";

Pattern patt = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = patt.matcher(content);

boolean isMatch = matcher.find();
StringBuffer buffer = new StringBuffer();

for (int i = 0; i < content.length(); i++) {
    while (matcher.find()) {
        matcher.appendReplacement(buffer, replace);
    }
}
matcher.appendTail(buffer);
System.out.println(buffer.toString());

In the above code content is input string,

I am trying to find repetitive occurrences from string and want to replace it with max no of occurrences

For Example

input -("abaaadccc",2)
output - "abaadcc"
here aaaand cccis replced by aa and cc as max allowed repitation is 2

In the above code, I found such occurrences and tried replacing them with -, it's working, But can someone help me How can I get current char and replace with allowed occurrences

i.e If aaa is found it is replaced by aa

or is there any alternative method w/o using regex?

Upvotes: 3

Views: 791

Answers (2)

default locale
default locale

Reputation: 13446

You can declare the second group in a regex and use it as a replacement:

String result = "aaabbbccaaa".replaceAll("(([a-zA-Z])\\2)\\2+", "$1");

Here's how it works:

(                        first group - a character repeated two times
    ([a-zA-Z])           second group - a character
    \2                   a character repeated once
)                        
\2+                      a character repeated at least once more

Thus, the first group captures a replacement string.

It isn't hard to extrapolate this solution for a different maximum value of allowed repeats:

String input = "aaaaabbcccccaaa";
int maxRepeats = 4;
String pattern = String.format("(([a-zA-Z])\\2{%s})\\2+", maxRepeats-1);
String result = input.replaceAll(pattern, "$1");
System.out.println(result); //aaaabbccccaaa

Upvotes: 2

Conffusion
Conffusion

Reputation: 4475

Since you defined a group in your regex, you can get the matching characters of this group by calling matcher.group(1). In your case it contains the first character from the repeating group so by appending it twice you get your expected result.

    CharSequence content = new StringBuffer("aaabbbccaaa");
    String pattern = "([a-zA-Z])\\1\\1+";

    Pattern patt = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
    Matcher matcher = patt.matcher(content);

    StringBuffer buffer = new StringBuffer();

    while (matcher.find()) {
        System.out.println("found : "+matcher.start()+","+matcher.end()+":"+matcher.group(1));
        matcher.appendReplacement(buffer, matcher.group(1)+matcher.group(1));
    }
    matcher.appendTail(buffer);
    System.out.println(buffer.toString());

Output:

found : 0,3:a
found : 3,6:b
found : 8,11:a
aabbccaa

Upvotes: 1

Related Questions