Francesco
Francesco

Reputation: 1

delete unwanted characters from URL

I have this variable String var = class.getSomething that contains this url http://www.google.com§°§#[]|£%/^<> .The output that comes out is this: http://www.google.comç°§#[]|£%/^<>. How can i delete that Ã? Thanks!

Upvotes: 0

Views: 2065

Answers (5)

laune
laune

Reputation: 31300

The string in var is output using utf-8, which results in the byte sequence:

c2 a7 c2 b0 c2 a7 23 5b 5d 7c c2 a3 25 2f 5e 3c 3e

This happens to be the iso-8859-1 encoding of the characters as you see them:

 § ° §#[]| £%/^<>
ç°§#[]|£%/^<>

C2 is the encoding for Â.

I'm not sure how the à was produced; it's encoding is C3.

We need the full code to learn how this happened, and a description how the character encoding for text files on your system is configured.

Modifying the variable var is useless.

Upvotes: 0

Kenzo_Gilead
Kenzo_Gilead

Reputation: 2439

You could do this, it replaces any character for empty getting your purpouse.

str = str.replace("Â", "");

With that you will replace  for nothing, getting the result you want.

Upvotes: 1

Lars
Lars

Reputation: 335

Do you really want to delete only that one character or all invalid characters? Otherwise you can check each character with CharacterUtils.isAsciiPrintable(char ch). However, according to RFC 3986 even fewer character are allowed in URLs (alphanumerics and "-_.+=!*'()~,:;/?$@&%", see Characters allowed in a URL).

In any case, you have to create a new String object (like with replace in the answer by Elias MP or putting valid characters one by one into a StringBuilder and convert it to a String) as Strings are immutable in Java.

Upvotes: 0

Mustapha Belmokhtar
Mustapha Belmokhtar

Reputation: 1219

specify the charset as UTF-8 to get rid of unwanted extra chars :

    String var = class.getSomething; 
    var = new String(var.getBytes(),"UTF-8");

Upvotes: 0

achAmh&#225;in
achAmh&#225;in

Reputation: 4266

Use String.replace

var = var.replace("Ã", "");

Upvotes: 0

Related Questions