M.M
M.M

Reputation: 1373

Remove non-ASCII characters from String in Java

I have a URI that contains non-ASCII characters like :

http://www.abc.de/qq/qq.ww?MIval=typo3_bsl_int_Smtliste&p_smtbez=Schmalbl�ttrigeSomerzischeruchtanb

How can I remove "�" from this URI

Upvotes: 22

Views: 42049

Answers (5)

Cᴏʀʏ
Cᴏʀʏ

Reputation: 107526

I'm guessing that the source of the URL is more at fault. Perhaps you're fixing the wrong problem? Removing "strange" characters from a URI might give it an entirely different meaning.

With that said, you may be able to remove all of the non-ASCII characters with a simple string replacement:

String fixed = original.replaceAll("[^\\x20-\\x7e]", "");

Or you can extend that to all non-four-byte-UTF-8 characters if that doesn't cover the "�" character:

String fixed = original.replaceAll("[^\\u0000-\\uFFFF]", "");

Upvotes: 41

Yellesh Chaparthi
Yellesh Chaparthi

Reputation: 413

To remove the Non- ASCII characters from String, below code worked for me.

String str="<UPC>616043287409ÂÂÂÂ</UPC>";

str = str.replaceAll("[^\\p{ASCII}]", "");

Output:

<UPC>616043287409</UPC>

Upvotes: 6

Juan Rada
Juan Rada

Reputation: 3766

Use Guava CharMatcher

String onlyAscii = CharMatcher.ascii().retainFrom(original)

Upvotes: 4

Peter L
Peter L

Reputation: 101

No no no no no, this is not ASCII ... [^\x20-\x7E]

This is real ascii: [^\x00-\x7F]

Otherwise it will trim out newlines and other special characters that are part of ascii table!

Upvotes: 7

daneshkohan
daneshkohan

Reputation: 336

yourstring=yourstring.replaceAll("[^\\p{ASCII}]", "");

Upvotes: 21

Related Questions