Jonalca
Jonalca

Reputation: 576

Java matcher not matching even conditioned to find() in a loop

Introduction

I want to extract a substring inside a String using RegEx in Java. For such, lets use Pattern and Matcher classes to do it properly.

Code

package stringlearning;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 *
 * @author Jonathan
 */
public class StringLearning {

    //Example String
    public static String line = "This is the (first|second|third) choice.";


    public static void main(String[] args) 
    {

        //String is right, right?
        System.out.println("Line is: " + line);

        //How to use RegEx in Java properly
        Pattern pattern = Pattern.compile("\\(([^\\)]+)\\)", Pattern.DOTALL);
        Matcher matcher = pattern.matcher(line);

        //While we find, keep looping
        while(matcher.find())
        {
            //What we foud out?
            System.out.println(matcher.matches());
            System.out.println(matcher.groupCount());
            System.out.println(matcher.group(1));
        }



    }

}

Problem

I still can't understand why it can't find anything. The regular expression was made on RegEx and works properly there (Do not forget about the escapes! '/')

I want to know what I am missing that it doesn't match

Notes

Upvotes: 4

Views: 1173

Answers (1)

anubhava
anubhava

Reputation: 784928

Problem is in this line inside the while loop:

System.out.println(matcher.matches());

Here matches() attempts to match the entire region against the pattern.

If the match succeeds then more information can be obtained via the start, end, and group methods.

Since your regex doesn't match entire input, matches() returns false and you will get java.lang.IllegalStateException where code calls .group(1).

To fix just comment out System.out.println(matcher.matches()); line and rerun the code.

btw you can use this shorter regex:

final Pattern pattern = Pattern.compile("\\(([^)]+)\\)");

As there is no need to escape ) inside the character class and DOTALL is redundant here since you're not using DOT anywhere in your regex.

Upvotes: 4

Related Questions