user1419243
user1419243

Reputation: 1705

Non-capturing group in java RegEx

I have written a code, but it doesn't work correctly. Here you can find my RegEx, what I have as the input and what I expect as the output. I am using a non-capturing group, because I want to read the text unti I get "Bundle" word, but I don't want to include it in the captured one. But I don't know what I have done wrongly which causes it not to work.

Here is my code:

Pattern pattern = Pattern.compile(
                "((Bundle\\s+Components)|(Included\\s+Components))\\s+(.*?)(?:Bundle)", Pattern.DOTALL);

        Matcher matcher = pattern.matcher(tableInformation);

        while (matcher.find()) {

            String bundleComponents = matcher.group();
            System.out.println(bundleComponents);
        }

Here are the examples: Example 1:

Bundle Components bla blah\blabla?!()\\ANY CHARACTER IS POSSIBLE HERE, EVEN LINEBREAK,blah blah
Bundle Type

Example 2:

 Included Components
    blah blah, like above,
    Bundle Type

output I expect for Ex. 1:

Bundle Components bla blah\blabla?!()\\ANY CHARACTER IS POSSIBLE HERE, EVEN LINEBREAK,blah blah

output I expect for Ex. 2:

Included Components
blah blah, like above,

What I get as the output for Ex. 2:

 Bundle Components bla blah\blabla?!()\\ANY CHARACTER IS POSSIBLE HERE, EVEN LINEBREAK,blah blah
    Bundle Type

What I get as the output for Ex. 2:

Included Components
blah blah, like above,
Bundle Type

Upvotes: 1

Views: 1811

Answers (2)

Egan Wolf
Egan Wolf

Reputation: 3573

In Full Match you get everything that regex says about, even non-capturing groups. You need to get appropriate Match to get rid of non-capturing groups. The other solution is to use positive lookahead instead of capturing group. Check the regex below. I also removed some unnecessary (IMO) groups.

(?:Bundle\s+Components|Included\s+Components)\s+.*?(?=Bundle)

It results with only one, full, match.

Demo

PS: The sign of new line just before "Bundle" will be captured as well in this solution.

Upvotes: 1

Marco Luzzara
Marco Luzzara

Reputation: 6036

You can do this with positive lookahead, since with this one the pattern inside the lookahead group is not included in the match:

((?:Bundle\\s+Components)|(?:Included\\s+Components))\\s+(.*?)(?=Bundle)

(not tested)

Upvotes: 1

Related Questions