john
john

Reputation: 11669

decoding issue with diacritic characters

What is the difference between these two below strings? When I decode first string then it works fine and I can see diachritic characters showing up fine.

String val = "m%C3%B6torhead album";
String decodedVal = URLDecoder.decode(val, StandardCharsets.UTF_8);

But when I try to decode below string then I don't see diachritic characters working fine.

String val = "m%EF%BF%BDtorhead album";
String decodedVal = URLDecoder.decode(val, StandardCharsets.UTF_8);

Can anyone tell me what's wrong here? These strings we are getting from upstream so we don't have control on that.

Upvotes: 0

Views: 225

Answers (2)

Henry
Henry

Reputation: 43728

The second sequence decodes to U+FFFD REPLACEMENT CHARACTER, which is used to replace an incoming character whose value is unknown or unrepresentable in Unicode.

This means you may see something like �.

There is nothing you could do on the client to fix that, the problem is on the server and needs to be fixed there.

Upvotes: 1

vinay chhabra
vinay chhabra

Reputation: 587

%C3%B6 is a valid encoded value for character ö so the value "m%C3%B6torhead album" is decoding perfectly. In second case "%EF%BF%BD" is not a valid encoded value for any characterset in UTF-8 encoding so it is not decoding it.

Upvotes: 0

Related Questions