Reputation: 9784
I'm just getting into regex in Java, I'm reading a book and the Java docs - and I can't figure why - given the following program - that "[\\s*]"
is not equivalent to "\\s*"
when used as a delimiter. It seems "[\\s*]"
is equivalent to "\\s+"
, can someone walk me through logically why this is so?
import java.util.Scanner;
import java.util.regex.Pattern;
public class ScanString {
public static void main(String[] args) {
String str = "Smith , where Jones had had 'had', had had 'had had'.";
String regex = "had";
System.out.println("String is:\n" + str + "\nToken sought is " + regex);
Pattern had = Pattern.compile(regex);
Scanner strScan = new Scanner(str);
strScan.useDelimiter("\\s*");
int hadCount = 0;
while(strScan.hasNext()) {
if(strScan.hasNext(had)) {
++hadCount;
System.out.println("Token found!: " + strScan.next(had));
} else {
System.out.println("Token is : " + strScan.next());
}
}
System.out.println("Count is: " + hadCount);
}
}
The output, which makes sense to me, is every non-whitespace character as a separate token. When the delimiter is changed to "\\s+"
or "[\\s*]"
, the output is:
String is:
Smith , where Jones had had 'had', had had 'had had'.
Token sought is had
Token is : Smith
Token is : ,
Token is : where
Token is : Jones
Token found!: had
Token found!: had
Token is : 'had',
Token found!: had
Token found!: had
Token is : 'had
Token is : had'.
Count is: 4
Upvotes: 3
Views: 1202
Reputation: 15523
Brackets []
enclose a character class. Inside them, rules about special characters are different. The only special characters there are "the closing bracket (]
), the backslash (\
), the caret (^
) and the hyphen (-
)." (taken from this page)
So in this case [\\s*]
means "either a space or *
".
When you are dealing with regular expressions, you can use websites like RegexPlanet (to test your code) or Regexper (to visualize graphically the regexp).
Upvotes: 4
Reputation: 124235
[]
is characters class. Take a look at these example: [abc]
means a|b|c
. If you create something like [a*]
it will mean a|\\*
(a
or *
character).
Upvotes: 1