Skizit
Skizit

Reputation: 44842

Java regex error

Whenever I enter the following...

Pattern pmessage = Pattern.compile("\s*\p{Alnum}[\p{Alnum}\s]*");
Matcher mmessage = pmessage.matcher(message);
Matcher msubject = pmessage.matcher(subject);

I get a Invalid Escape Sequence error. Anyone have any idea why / how I fix this?

Upvotes: 0

Views: 2743

Answers (4)

tchrist
tchrist

Reputation: 80384

For a version of \p{Alpha} that works on the Java native character set instead being stuck unsable to process anything else than legacy data from the 1960s, you need to use

alphabetics = "[\\pL\\pM\\p{Nl]";

For a version of numerics in the same sense, you have to choose which of these you want:

ASCII_digits    = "[0-9]";
all_numbers     = "\\pN";
decimal_numbers = "\\p{Nd}"

because which one applies various depending on circumstances. We’ll assume you copied one of those three to a numeric variable.

Assuming you then want alphanumerics based on the definition above, you could then write:

 alphanumerics = "[" + alphabetics + numerics + "]";

However, if what you mean by alphanumerics is the \w sense of program identifiers, you have to add some stuff.

 identifier_chars = "[\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}[\\p{InEnclosedAlphanumerics}&&\\p{So}]]";

This issue is discussed at length in this answer, where you’ll also find a link to some alpha code of mine that does these transforms for you automatically. I hope to get a chance to rewrite it to take up less space this weekend.

Upvotes: 2

Mikarnage
Mikarnage

Reputation: 893

You didn't correctly escape your "\" characters : in java, "\s" will give you \s, so you should write :

Pattern.compile("\\s*\\p{Alnum}[\\p{Alnum}\\s]*");

Upvotes: 1

RoToRa
RoToRa

Reputation: 38390

Keep in mind, that backslashes are special characters in Java strings, that need to be escaped with an additional backslash:

Pattern.compile("\\s*\\p{Alnum}[\\p{Alnum}\\s]*");

Upvotes: 1

NPE
NPE

Reputation: 500227

Double each backslash: Pattern.compile("\\s*\\p{Alnum}[\\p{Alnum}\\s]*")

Backslashes inside string literals have a special meaning, and have to be duplicated in order for the actual backslash character to become part of the string (which is what is required in your regex example.)

Upvotes: 1

Related Questions