somnathchakrabarti
somnathchakrabarti

Reputation: 3086

Java Regex Matcher skipping the matches

Below is my Java code to delete all pair of adjacent letters that match, but I am getting some problems with the Java Matcher class.

My Approach

I am trying to find all successive repeated characters in the input e.g.

aaa, bb, ccc, ddd

Next replace the odd length match with the last matched pattern and even length match with "" i.e.

aaa -> a
bb -> ""
ccc -> c
ddd -> d
s has single occurrence, so it's not matched by the regex pattern and excluded from the substitution

I am calling Matcher.appendReplacement to do conditional replacement of the patterns matched in input, based on the group length (even or odd).

Code:

public static void main(String[] args) {
        String s = "aaabbcccddds";
        int i=0;
        StringBuffer output = new StringBuffer();
        Pattern repeatedChars = Pattern.compile("([a-z])\\1+");
        Matcher m = repeatedChars.matcher(s);
        while(m.find()) {
            if(m.group(i).length()%2==0)
                m.appendReplacement(output, "");
            else
                m.appendReplacement(output, "$1");
            i++;
        }
        m.appendTail(output);
        System.out.println(output);
    }

Input : aaabbcccddds

Actual Output : aaabbcccds (only replacing ddd with d but skipping aaa, bb and ccc)

Expected Output : acds

Upvotes: 1

Views: 1089

Answers (3)

revo
revo

Reputation: 48711

You don't need multiple if statements. Try:

(?:(\\w)(?:\\1\\1)+|(\\w)\\2+)(?!\\1|\\2)

Replace with $1

Regex live demo

Java code:

str.replaceAll("(?:(\\w)(?:\\1\\1)+|(\\w)\\2+)(?!\\1|\\2)", "$1");

Java live demo

Regex breakdown:

  • (?: Start of non-capturing group
    • (\\w) Capture a word character
    • (?:\\1\\1)+ Match an even number of same character
    • | Or
    • (\\w) Capture a word character
    • \\2+ Match any number of same character
  • ) End of non-capturing group
  • (?!\\1|\\2) Not followed by previous captured characters

Using Pattern and Matcher with StringBuffer:

StringBuffer output = new StringBuffer();
Pattern repeatedChars = Pattern.compile("(?:(\\w)(?:\\1\\1)+|(\\w)\\2+)(?!\\1|\\2)");
Matcher m = repeatedChars.matcher(s);
while(m.find()) m.appendReplacement(output, "$1");
m.appendTail(output);
System.out.println(output);

Upvotes: 1

anubhava
anubhava

Reputation: 785128

This can be done in a single replaceAll call like this:

String repl = str.replaceAll( "(?:(.)\\1)+", "" );

Regex expression (?:(.)\\1)+ matches all occurrences of even repetitions and replaces it with empty string this leaving us with first character of odd number of repetitions.

RegEx Demo


Code using Pattern and Matcher:

final Pattern p = Pattern.compile( "(?:(.)\\1)+" );
Matcher m = p.matcher( "aaabbcccddds" );
String repl = m.replaceAll( "" );
//=> acds

Upvotes: 2

Veselin Davidov
Veselin Davidov

Reputation: 7071

You can try like that:

public static void main(String[] args) {
    String s = "aaabbcccddds";
    StringBuffer output = new StringBuffer();
    Pattern repeatedChars = Pattern.compile("(\\w)(\\1+)");
    Matcher m = repeatedChars.matcher(s);
    while(m.find()) {
        if(m.group(2).length()%2!=0)
            m.appendReplacement(output, "");
        else
            m.appendReplacement(output, "$1");
    }
    m.appendTail(output);
    System.out.println(output);
}

It is similar to yours but when getting just the first group you match the first character and your length is always 0. That's why I introduce a second group which is the matched adjacent characters. Since it has length of -1 I reverse the odd even logic and voila -

acds

is printed.

Upvotes: 1

Related Questions