Victoria
Victoria

Reputation: 911

Java regular expression to remove all non alphanumeric characters EXCEPT spaces

I'm trying to write a regular expression in Java which removes all non-alphanumeric characters from a paragraph, except the spaces between the words.

This is the code I've written:

paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\s]", "");

However, the compiler gave me an error message pointing to the s saying it's an illegal escape character. The program compiled OK before I added the \s to the end of the regular expression, but the problem with that was that the spaces between words in the paragraph were stripped out.

How can I fix this error?

Upvotes: 37

Views: 71668

Answers (5)

WilliamK
WilliamK

Reputation: 1772

Please take a look at this site, you can test Java Regex online and get wellformatted regex string patterns back:

http://www.regexplanet.com/advanced/java/index.html

Upvotes: 1

NominSim
NominSim

Reputation: 8511

You need to escape the \ so that the regular expression recognizes \s :

paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\\s]", "");

Upvotes: 15

Hunter McMillen
Hunter McMillen

Reputation: 61512

Generally whenever you see that error, it means you only have a single backslash where you need two:

paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\\s]", "");

Upvotes: 5

Igor Chubin
Igor Chubin

Reputation: 64563

Victoria, you must write \\s not \s here.

Upvotes: 4

jqno
jqno

Reputation: 15520

You need to double-escape the \ character: "[^a-zA-Z0-9\\s]"

Java will interpret \s as a Java String escape character, which is indeed an invalid Java escape. By writing \\, you escape the \ character, essentially sending a single \ character to the regex. This \ then becomes part of the regex escape character \s.

Upvotes: 58

Related Questions