Housefly
Housefly

Reputation: 4422

Regex for special characters in java

public static final String specialChars1= "\\W\\S";
String str2 = str1.replaceAll(specialChars1, "").replace(" ", "+");

public static final String specialChars2 = "`~!@#$%^&*()_+[]\\;\',./{}|:\"<>?";
String str2 = str1.replaceAll(specialChars2, "").replace(" ", "+");

Whatever str1 is I want all the characters other than letters and numbers to be removed, and spaces to be replaced by a plus sign (+).

My problem is if I use specialChar1, it does not remove some characters like ;, ', ", and if I am use specialChar2 it gives me an error :

java.util.regex.PatternSyntaxException: Syntax error U_REGEX_MISSING_CLOSE_BRACKET near index 32:

How can this be to achieved?. I have searched but could not find a perfect solution.

Upvotes: 4

Views: 98311

Answers (6)

TG Gowda
TG Gowda

Reputation: 11917

I had a similar problem to solve and I used following method:

text.replaceAll("\\p{Punct}+", "").replaceAll("\\s+", "+");

Code with time bench marking

public static String cleanPunctuations(String text) {
    return text.replaceAll("\\p{Punct}+", "").replaceAll("\\s+", "+");
}

public static void test(String in){
    long t1 = System.currentTimeMillis();
    String out = cleanPunctuations(in);
    long t2 = System.currentTimeMillis();
    System.out.println("In=" + in + "\nOut="+ out + "\nTime=" + (t2 - t1)+ "ms");

}

public static void main(String[] args) {
    String s1 = "My text with 212354 digits spaces and \n newline \t tab " +
            "[`~!@#$%^&*()_+[\\\\]\\\\\\\\;\\',./{}|:\\\"<>?] special chars";
    test(s1);
    String s2 = "\"Sample Text=\"  with - minimal \t punctuation's";
    test(s2);
}

Sample Output

In=My text with 212354 digits spaces and 
 newline     tab [`~!@#$%^&*()_+[\\]\\\\;\',./{}|:\"<>?] special chars
Out=My+text+with+212354+digits+spaces+and+newline+tab+special+chars
Time=4ms
In="Sample Text="  with - minimal    punctuation's
Out=Sample+Text+with+minimal+punctuations
Time=0ms

Upvotes: 1

LeHill
LeHill

Reputation: 1

@npinti

using "\w" is the same as "\dA-Za-z"

This worked for me:

String result = str.replaceAll("[^\\w ]", "").replaceAll("\\s+", "+");

Upvotes: 0

user2011942
user2011942

Reputation: 1

you can use a regex like this:

[<#![CDATA[¢<(+|!$*);¬/¦,%_>?:#="~{@}\]]]#>]`

remove "#" at first and at end from expression

regards

Upvotes: 0

Dunes
Dunes

Reputation: 40683

The problem with your first regex, is that "\W\S" means find a sequence of two characters, the first of which is not a letter or a number followed by a character which is not whitespace.

What you mean is "[^\w\s]". Which means: find a single character which is neither a letter nor a number nor whitespace. (we can't use "[\W\S]" as this means find a character which is not a letter or a number OR is not whitespace -- which is essentially all printable character).

The second regex is a problem because you are trying to use reserved characters without escaping them. You can enclose them in [] where most characters (not all) do not have special meanings, but the whole thing would look very messy and you have to check that you haven't missed out any punctuation.

Example:

String sequence = "qwe 123 :@~ ";

String withoutSpecialChars = sequence.replaceAll("[^\\w\\s]", "");

String spacesAsPluses = withoutSpecialChars.replaceAll("\\s", "+");

System.out.println("without special chars: '"+withoutSpecialChars+ '\'');
System.out.println("spaces as pluses: '"+spacesAsPluses+'\'');

This outputs:

without special chars: 'qwe 123  '
spaces as pluses: 'qwe+123++'

If you want to group multiple spaces into one + then use "\s+" as your regex instead (remember to escape the slash).

Upvotes: 2

npinti
npinti

Reputation: 52185

This worked for me:

String result = str.replaceAll("[^\\dA-Za-z ]", "").replaceAll("\\s+", "+");

For this input string:

/-+!@#$%^&())";:[]{}\ |wetyk 678dfgh

It yielded this result:

+wetyk+678dfgh

Upvotes: 17

Joey
Joey

Reputation: 354416

replaceAll expects a regex:

public static final String specialChars2 = "[`~!@#$%^&*()_+[\\]\\\\;\',./{}|:\"<>?]";

Upvotes: 7

Related Questions