Reputation: 1722
I have following url:
https://mantis.server.company/download/test/0022450-umlauts_öä_üüü_and_special_chars_%&$#.pdf
There is no way to encode the string before. I simply have to process this string (I know it is not a valid URL string) so that the file which is behind that path can be opened.
String url = "https://mantis-daun.server.company/download/test/0022450-umlauts_öä_üüü_and_special_chars_%&$#.pdf";
try {
url = URLDecoder.decode(url, "UTF-8");
URL myConnection = new URL(url);
URLConnection connectMe = myConnection.openConnection();
// Only for error processing
HttpURLConnection httpConn = (HttpURLConnection) connectMe;
InputStream is;
if (httpConn.getResponseCode() >= 400) {
is = httpConn.getErrorStream();
} else {
is = httpConn.getInputStream();
}
BufferedReader rd = new BufferedReader(new InputStreamReader(is));
String line;
while ((line = rd.readLine()) != null)
{
System.out.println("-----" + line);
}
rd.close();
InputStream in = connectMe.getInputStream();
BufferedInputStream bin = new BufferedInputStream(in);
byte[] buffer = new byte[(int)connectMe.getContentLength()];
int fi = 0;
while(fi<buffer.length) {
fi = fi + bin.read(buffer, fi, buffer.length - fi);
}
bin.close();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
With this approach I get:
Exception in thread "main" java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - For input string: "&$"
at java.net.URLDecoder.decode(URLDecoder.java:173)
at org.mssql.main.MSSQLAccess.main(MSSQLAccess.java:34)
With url = url.replaceAll("%", "%25");
I get:
-----<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
-----<html><head>
-----<title>400 Bad Request</title>
-----</head><body>
-----<h1>Bad Request</h1>
-----<p>Your browser sent a request that this server could not understand.<br />
-----</p>
-----<hr>
java.io.IOException: Server returned HTTP response code: 400 for URL: https://mantis-daun.server.company/download/test/0022450-umlauts_öä_üüü_and_special_chars_%&$#.pdf
-----<address>Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny16 with Suhosin-Patch mod_ssl/2.2.9 OpenSSL/0.9.8o Server at mantis-daun.server.company Port 443</address>
-----</body></html>
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
at java.security.AccessController.doPrivileged(Native Method)
at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:234)
at org.mssql.main.MSSQLAccess.main(MSSQLAccess.java:51)
Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: https://mantis-daun.server.company/download/test/0022450-umlauts_öä_üüü_and_special_chars_%&$#.pdf
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
at org.mssql.main.MSSQLAccess.main(MSSQLAccess.java:39)
If I want to open the "URL" in a normal browser I get also a "400: BAD REQUEST".
So, is there a way to process the string with umlauts and special chars so that it can be used as a "URL"?
Maybe there is also something wrong with server settings?
Upvotes: 2
Views: 4667
Reputation: 46050
First, as Xavjer pointed, you need to encode the URL. Next, it makes sense to split the URL and encode only "textual" part of the path. The domain name is not encoded (and if you have non-latin domain name, it must be encoded according to Punycode ), also path separators must be preserved (which is not the case when you encode the URL in whole). So you encode only the "download", "test" and filename+extension parts
Upvotes: 0
Reputation: 9224
Well, you try to decode the url, but you actually should encode it to make what you desire. It actually crashes because it tries to decode %&$ which is no valid hex sign...
Encoding will result in: https%3A%2F%2Fmantis-daun.server.company%2Fdownload%2Ftest%2F0022450-umlauts_%C3%B6%C3%A4_%C3%BC%C3%BC%C3%BC_and_special_chars_%25%26%24%23.pdf
Upvotes: 1