Reputation: 4422
public static final String specialChars1= "\\W\\S";
String str2 = str1.replaceAll(specialChars1, "").replace(" ", "+");
public static final String specialChars2 = "`~!@#$%^&*()_+[]\\;\',./{}|:\"<>?";
String str2 = str1.replaceAll(specialChars2, "").replace(" ", "+");
Whatever str1
is I want all the characters other than letters and numbers to be removed, and spaces to be replaced by a plus sign (+
).
My problem is if I use specialChar1
, it does not remove some characters like ;
, '
, "
, and if I am use specialChar2
it gives me an error :
java.util.regex.PatternSyntaxException: Syntax error U_REGEX_MISSING_CLOSE_BRACKET near index 32:
How can this be to achieved?. I have searched but could not find a perfect solution.
Upvotes: 4
Views: 98311
Reputation: 11917
I had a similar problem to solve and I used following method:
text.replaceAll("\\p{Punct}+", "").replaceAll("\\s+", "+");
public static String cleanPunctuations(String text) {
return text.replaceAll("\\p{Punct}+", "").replaceAll("\\s+", "+");
}
public static void test(String in){
long t1 = System.currentTimeMillis();
String out = cleanPunctuations(in);
long t2 = System.currentTimeMillis();
System.out.println("In=" + in + "\nOut="+ out + "\nTime=" + (t2 - t1)+ "ms");
}
public static void main(String[] args) {
String s1 = "My text with 212354 digits spaces and \n newline \t tab " +
"[`~!@#$%^&*()_+[\\\\]\\\\\\\\;\\',./{}|:\\\"<>?] special chars";
test(s1);
String s2 = "\"Sample Text=\" with - minimal \t punctuation's";
test(s2);
}
In=My text with 212354 digits spaces and
newline tab [`~!@#$%^&*()_+[\\]\\\\;\',./{}|:\"<>?] special chars
Out=My+text+with+212354+digits+spaces+and+newline+tab+special+chars
Time=4ms
In="Sample Text=" with - minimal punctuation's
Out=Sample+Text+with+minimal+punctuations
Time=0ms
Upvotes: 1
Reputation: 1
@npinti
using "\w" is the same as "\dA-Za-z"
This worked for me:
String result = str.replaceAll("[^\\w ]", "").replaceAll("\\s+", "+");
Upvotes: 0
Reputation: 1
you can use a regex like this:
[<#![CDATA[¢<(+|!$*);¬/¦,%_>?
:#="~{@}\]]]#>]`
remove "#" at first and at end from expression
regards
Upvotes: 0
Reputation: 40683
The problem with your first regex, is that "\W\S"
means find a sequence of two characters, the first of which is not a letter or a number followed by a character which is not whitespace.
What you mean is "[^\w\s]"
. Which means: find a single character which is neither a letter nor a number nor whitespace. (we can't use "[\W\S]"
as this means find a character which is not a letter or a number OR is not whitespace -- which is essentially all printable character).
The second regex is a problem because you are trying to use reserved characters without escaping them. You can enclose them in []
where most characters (not all) do not have special meanings, but the whole thing would look very messy and you have to check that you haven't missed out any punctuation.
Example:
String sequence = "qwe 123 :@~ ";
String withoutSpecialChars = sequence.replaceAll("[^\\w\\s]", "");
String spacesAsPluses = withoutSpecialChars.replaceAll("\\s", "+");
System.out.println("without special chars: '"+withoutSpecialChars+ '\'');
System.out.println("spaces as pluses: '"+spacesAsPluses+'\'');
This outputs:
without special chars: 'qwe 123 '
spaces as pluses: 'qwe+123++'
If you want to group multiple spaces into one +
then use "\s+"
as your regex instead (remember to escape the slash).
Upvotes: 2
Reputation: 52185
This worked for me:
String result = str.replaceAll("[^\\dA-Za-z ]", "").replaceAll("\\s+", "+");
For this input string:
/-+!@#$%^&())";:[]{}\ |wetyk 678dfgh
It yielded this result:
+wetyk+678dfgh
Upvotes: 17
Reputation: 354416
replaceAll
expects a regex:
public static final String specialChars2 = "[`~!@#$%^&*()_+[\\]\\\\;\',./{}|:\"<>?]";
Upvotes: 7