AgentM
AgentM

Reputation: 406

Non capturing group in Java Scanner is ignored

I am trying to get the scanner split a string on every @ symbol, except when escaped (or at the start of a line)

My RegEx: (?:[^\\])@

(?:            // Start of non-capturing group (0)
  [            // Match any characters in square brackets [
    ^\\        // Match any non-\ character.
  ]            // ]
)              // End of non-capturing group (0)
@              // Match literal '@'

From, my understanding, this should work for my intentions.

However when using this pattern in a scanner, it simply ignores the fact that the non-capturing group should not be counted towards the delimiter, simply to match against, the delimiter (the part to be removed/split at) should be just '@'. So for the following example String: "Hello@World", The result would have to be ["Hello", "World"].

Except running below code sample:

private static void test() {
    try (Scanner sc = new Scanner("test@here")) {
        sc.useDelimiter("(?:[^\\\\])@"); // Every unescaped @ sign.
        while (sc.hasNext()) {
            String token = sc.next();
            System.out.println(token);
        }
    }   
}

yields:

tes
here

instead of the expected:

test
here

Upvotes: 2

Views: 147

Answers (2)

AgentM
AgentM

Reputation: 406

The Scanner doesn't use capturing groups like replace all.

Instead you should use negative look behind. So your pattern would look like this instead:

(?<!\\)@

This also cleans up the negation class required.

Where the : is simply replaced with the <! To make the non-capturing group, a negative look behind group.

Upvotes: 2

kshetline
kshetline

Reputation: 13682

The delimiter is considered the entire match without any regard to groups, capturing or not-capturing.

What you need is a lookbehind pattern, and the syntax is easier here with a negative lookbehind.

sc.useDelimiter("(?<!\\\\)@");

Upvotes: 5

Related Questions