Reputation: 911
I'm trying to write a regular expression in Java which removes all non-alphanumeric characters from a paragraph, except the spaces between the words.
This is the code I've written:
paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\s]", "");
However, the compiler gave me an error message pointing to the s saying it's an illegal escape character. The program compiled OK before I added the \s to the end of the regular expression, but the problem with that was that the spaces between words in the paragraph were stripped out.
How can I fix this error?
Upvotes: 37
Views: 71668
Reputation: 1772
Please take a look at this site, you can test Java Regex online and get wellformatted regex string patterns back:
http://www.regexplanet.com/advanced/java/index.html
Upvotes: 1
Reputation: 8511
You need to escape the \ so that the regular expression recognizes \s :
paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\\s]", "");
Upvotes: 15
Reputation: 61512
Generally whenever you see that error, it means you only have a single backslash where you need two:
paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\\s]", "");
Upvotes: 5
Reputation: 15520
You need to double-escape the \
character: "[^a-zA-Z0-9\\s]"
Java will interpret \s
as a Java String escape character, which is indeed an invalid Java escape. By writing \\
, you escape the \
character, essentially sending a single \
character to the regex. This \
then becomes part of the regex escape character \s
.
Upvotes: 58