Reputation: 2868
Is there an Android class that (correctly) encodes URLs containing unicode characters? For example:
Blue Öyster Cult
Is converted to the following using java.net.URI:
uri.toString()
(java.lang.String) Blue%20Öyster%20Cult
The Ö character is not encoded. Using URLEncoder
:
URLEncoder.encode("Blue Öyster Cult", "UTF-8").toString()
(java.lang.String) Blue+%C3%96yster+Cult
It encodes too much (i.e. spaces become "+" and path separators "/" become %2F). If I click on a link containing unicode characters with the Dolphin web browser it works correctly, so obviously this can be done. But if I try to open an HttpURLConnection using any of the above strings, I get an HTTP 404 Not Found
exception.
Upvotes: 0
Views: 1301
Reputation: 2868
I ended up hacking together a solution that seems to work for this, but is probably not the most robust:
url = new URL(userSuppliedPath);
String context = url.getProtocol();
String hostname = url.getHost();
String thePath = url.getPath();
int port = url.getPort();
thePath = thePath.replaceAll("(^/|/$)", ""); // removes beginning/end slash
String encodedPath = URLEncoder.encode(thePath, "UTF-8"); // encodes unicode characters
encodedPath = encodedPath.replace("+", "%20"); // change + to %20 (space)
encodedPath = encodedPath.replace("%2F", "/"); // change %2F back to slash
urlString = context + "://" + hostname + ":" + port + "/" + encodedPath;
Upvotes: 2
Reputation: 9635
URLEncoder is designed to be used to encode form content, not whole URI's. Encoding / as %2F is intentional to prevent user input from being interpreted as a directory, and + is valid encoding for form data. (form data == part of the URI following the ?)
Ideally, you would encode "Blue Öyster Cult" before appending it to your base URI, instead of encoding the whole string. And if "Blue Öyster Cult" is part of the path instead of part of the query string, you have to replace + with %20 yourself. With these restrictions, URLEncoder works fine.
Upvotes: 1