Reputation: 1423
I'm parsing some links on a webpage and then testing if these links exist or not. I am converting the parsed link strings into uri's, the problem is some of the links already have encoded characters like the following: http://download.microsoft.com/download/6/3/c/63c1d527-9d7e-4fd6-9867-fd0632066740/kinect_qsg%20premium_bndl_en-fr-es.pdf
Which when passed through my code below I get: http://download.microsoft.com/download/6/3/c/63c1d527-9d7e-4fd6-9867-fd0632066740/kinect_qsg%2520premium_bndl_en-fr-es.pdf
Which as you can see is encoding the %20. How do I avoid this? Should I decode my string's first? And if so what's the best way to do this?
URL url = null;
URI uri = null;
try {
url = new URL(checkUrl);
} catch (MalformedURLException e1) {
e1.printStackTrace();
}
try {
uri = new URI(url.getProtocol(), url.getAuthority(), url.getPath(), url.getQuery(), url.getRef());
} catch (URISyntaxException e1) {
e1.printStackTrace();
}
Upvotes: 1
Views: 499
Reputation: 5220
You can use:
String decoded = URLDecoder.decode(yorUrl, "UTF-8");
Upvotes: 1
Reputation: 4690
Try using URLDecoder class,
URL url = null;
URI uri = null;
String checkUrl = "http://download.microsoft.com/download/6/3/c/63c1d527-9d7e-4fd6-9867-fd0632066740/kinect_qsg%20premium_bndl_en-fr-es.pdf";
try {
url = new URL(URLDecoder.decode(checkUrl,"UTF-8"));
} catch (MalformedURLException e1) {
e1.printStackTrace();
} catch (UnsupportedEncodingException e1) {
e1.printStackTrace();
}
try {
uri = new URI(url.getProtocol(), url.getAuthority(), url.getPath(), url.getQuery(), url.getRef());
System.out.println(uri.getHost());
} catch (URISyntaxException e1) {
e1.printStackTrace();
}
The class path for the class is java.net.URLDecoder
Upvotes: 2