kibar
kibar

Reputation: 824

Java repeated character regex with condition

I have large database. I want to check my database capitalize errors. I use this pattern for repeated chars. Pattern works but i need to start and end condition with string.

Pattern:

(\w)\1+

Target String:

Javaaa

result: aaa

I want to add condition to regex; Start with Ja and end with a*. Result **only must be repetead characters.

(I dont want to control programmatically only regex do this if its possible

(I'm do this with String.replaceAll(regex, string) not to Pattern or Matcher class)

Upvotes: 1

Views: 119

Answers (2)

Mahendra
Mahendra

Reputation: 1426

Another code example (inspired from @Wiktor Stribizew's code ) as per your expected input and output format.

public static void main( String[] args )
{
    String[] input =
        { "Javaaa", "Javaaaaaaaaa", "Javaaaaaaaaaaaaaaaaaa", "Paoooo", "Paoooooooo", "Paooooooooxxxxxxxxx" };
    for ( String str : input )
    {

        System.out.println( "Target String :" + str );
        Pattern pattern = Pattern.compile( "((.)\\2+)" );
        Matcher matcher = pattern.matcher( str );
        while ( matcher.find() )
        {
            System.out.println( "result: " + matcher.group() );
        }
        System.out.println( "---------------------" );
    }
    System.out.println( "Finish" );
}

Output:

Target String :Javaaa
result: aaa
---------------------
Target String :Javaaaaaaaaa
result: aaaaaaaaa
---------------------
Target String :Javaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaa
---------------------
Target String :Paoooo
result: oooo
---------------------
Target String :Paoooooooo
result: oooooooo
---------------------
Target String :Paooooooooxxxxxxxxx
result: oooooooo
result: xxxxxxxxx
---------------------
Finish

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626754

You may use a lookahead anchored at the leading word boundary:

\b(?=Ja\w*a\b)\w*?((\w)\2+)\w*\b

See the regex demo

Details:

  • \b - leading word boundary
  • (?=Ja\w*a\b) - a positive lookahead that requires the whole word to start with Ja, then it can have 0+ word characters and end with a
  • \w*? - 0+ word characters but as few as possible
  • ((\w)\2+) - Group 1 matching identical consecutive characters
  • \w* - any remaining word characters (0 or more)
  • \b - trailing word boundary.

The result you are seeking is in Group 1.

String s = "Prooo\nJavaaa";
Pattern pattern = Pattern.compile("\\b(?=Ja\\w*a\\b)\\w*?((\\w)\\2+)\\w*\\b");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
    System.out.println(matcher.group(1)); 
} 

See the Java demo.

Upvotes: 2

Related Questions