Destructor
Destructor

Reputation: 3284

Does -* have any special meaning in regular expression?

I have string as:

String str = Hello+Bye-see*Go/ok

Now, I wanted to split based on +, -, * and /. So I did:

str.split("[+-*/]");

But this failed, throwing an error:

Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal character range near index 3
[+-*/]
   ^
    at java.util.regex.Pattern.error(Pattern.java:1924)
    at java.util.regex.Pattern.range(Pattern.java:2594)
    at java.util.regex.Pattern.clazz(Pattern.java:2507)
    at java.util.regex.Pattern.sequence(Pattern.java:2030)
    at java.util.regex.Pattern.expr(Pattern.java:1964)
    at java.util.regex.Pattern.compile(Pattern.java:1665)
    at java.util.regex.Pattern.<init>(Pattern.java:1337)
    at java.util.regex.Pattern.compile(Pattern.java:1022)
    at java.lang.String.split(String.java:2313)
    at java.lang.String.split(String.java:2355)

Then I changed the regex to:

str.split("[-+*/]");

And it works perfectly fine ! So I was wondering if -* has any special meaning? What did I do wrong in the regex [+-*/]?

Upvotes: 3

Views: 2057

Answers (1)

zx81
zx81

Reputation: 41838

A. Where is the Error?

The problem is not -*. The problem is that in a [character class], the hyphen - has special meaning. For instance, in [a-z], it means all characters ranging from a to z. Therefore, when you have +-* in your character class, we are looking for characters ranging from + (ASCII 43) to * (ASCII 42). Not valid, hence the error. Technically, as @Pshemo writes in a comment, Java doesn't use indexes of characters based on ASCII but based on Unicode Table. But since the 128 first ASCII character points are the same in Unicode, the result is the same.

You need to either escape the hyphen like so \- or, as you have observed, throw the - at the front (or back) of your class, where it does not indicate a character range:

[-+*/]

Therefore, in a split (using the "at the back" version for variety):

String[] result = your_original_string.split("[+*/-]");

B. But [*-+] would be valid!!! (ASCII 42 to 43)

If you reverse the + and the *, you have a valid ASCII range (42 to 43). Of course there's no point doing so, since (i) there are no characters in the middle and (ii) that would confuse my dog.

C. Does -* have special meaning?

It does, but not in a character class. Outside a character class, that means match a hyphen, zero or more times.

Upvotes: 17

Related Questions