Reputation: 1373
I have a URI that contains non-ASCII characters like :
http://www.abc.de/qq/qq.ww?MIval=typo3_bsl_int_Smtliste&p_smtbez=Schmalbl�ttrigeSomerzischeruchtanb
How can I remove "�" from this URI
Upvotes: 22
Views: 42049
Reputation: 107526
I'm guessing that the source of the URL is more at fault. Perhaps you're fixing the wrong problem? Removing "strange" characters from a URI might give it an entirely different meaning.
With that said, you may be able to remove all of the non-ASCII characters with a simple string replacement:
String fixed = original.replaceAll("[^\\x20-\\x7e]", "");
Or you can extend that to all non-four-byte-UTF-8 characters if that doesn't cover the "�" character:
String fixed = original.replaceAll("[^\\u0000-\\uFFFF]", "");
Upvotes: 41
Reputation: 413
To remove the Non- ASCII characters from String, below code worked for me.
String str="<UPC>616043287409ÂÂÂÂ</UPC>";
str = str.replaceAll("[^\\p{ASCII}]", "");
Output:
<UPC>616043287409</UPC>
Upvotes: 6
Reputation: 3766
Use Guava CharMatcher
String onlyAscii = CharMatcher.ascii().retainFrom(original)
Upvotes: 4
Reputation: 101
No no no no no, this is not ASCII ... [^\x20-\x7E]
This is real ascii: [^\x00-\x7F]
Otherwise it will trim out newlines and other special characters that are part of ascii table!
Upvotes: 7