Lucas
Lucas

Reputation: 132

How to decode non-ASCII URLs

I am writing an HTTP server, and to test what messes it up, I entered ઔஇᆖ into a text field. The client request is

GET /add_text_data?message=%E0%AA%94%E0%AE%87%E1%86%96&category=log&color=black HTTP/1.1
Host: localhost
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8

When I used URLDecodeer.decode("%E0%AA%94%E0%AE%87%E1%86%96", "UTF-8"), I got ???. How do I fix this?

Upvotes: 0

Views: 104

Answers (2)

Lucas
Lucas

Reputation: 132

It turns out that this is actually not an issue with URLDecoder, but with OutputStream. URLDecodeer.decode("%E0%AA%94%E0%AE%87%E1%86%96", "UTF-8").equals("ઔஇᆖ") is actually true. I just needed to set Eclipse to accept UTF-8 output. This question fixed it for me.

Upvotes: 1

user3657661
user3657661

Reputation: 416

Looks like UTF-8 can't handle it.

You can test decoding here to see what kind of decoding you'd have to use.

http://encoder.mattiasgeniar.be/index.php

Make sure you store the result in some kind of datatype that can accept unicode if you do.

Upvotes: 0

Related Questions