Steve M
Steve M

Reputation: 9784

Why isn't [\\s*] equivalent to \\s*?

I'm just getting into regex in Java, I'm reading a book and the Java docs - and I can't figure why - given the following program - that "[\\s*]" is not equivalent to "\\s*" when used as a delimiter. It seems "[\\s*]" is equivalent to "\\s+", can someone walk me through logically why this is so?

import java.util.Scanner;
import java.util.regex.Pattern;
public class ScanString {
    public static void main(String[] args) {
        String str = "Smith , where Jones had had 'had', had had 'had had'.";
        String regex = "had";
        System.out.println("String is:\n" + str + "\nToken sought is " + regex);

        Pattern had = Pattern.compile(regex);
        Scanner strScan = new Scanner(str);
        strScan.useDelimiter("\\s*");
        int hadCount = 0;
        while(strScan.hasNext()) {
            if(strScan.hasNext(had)) {
                ++hadCount;
                System.out.println("Token found!: " + strScan.next(had));

            } else {
                System.out.println("Token is    : " + strScan.next());
            }
        }
        System.out.println("Count is: " + hadCount);
    }
}

The output, which makes sense to me, is every non-whitespace character as a separate token. When the delimiter is changed to "\\s+" or "[\\s*]" , the output is:

String is:
Smith , where Jones had had 'had', had had 'had had'.
Token sought is had
Token is    : Smith
Token is    : ,
Token is    : where
Token is    : Jones
Token found!: had
Token found!: had
Token is    : 'had',
Token found!: had
Token found!: had
Token is    : 'had
Token is    : had'.
Count is: 4

Upvotes: 3

Views: 1202

Answers (2)

Cyrille Ka
Cyrille Ka

Reputation: 15523

Brackets [] enclose a character class. Inside them, rules about special characters are different. The only special characters there are "the closing bracket (]), the backslash (\), the caret (^) and the hyphen (-)." (taken from this page)

So in this case [\\s*] means "either a space or *".

When you are dealing with regular expressions, you can use websites like RegexPlanet (to test your code) or Regexper (to visualize graphically the regexp).

Upvotes: 4

Pshemo
Pshemo

Reputation: 124235

[] is characters class. Take a look at these example: [abc] means a|b|c. If you create something like [a*] it will mean a|\\* (a or * character).

Upvotes: 1

Related Questions