Reputation: 343
I want to resolve some urls. For that, I take the result of :
new URL(new URL(baseurl), link);
This method seems to fail when baseurl="http://www.site.com"
and link="./"
You will have the following result http://www.site.com/./
instead of just http://www.site.com/
How can I solve the problem?
Upvotes: 0
Views: 114
Reputation: 47183
This is a very long and, although informative, largely unhelpful post, but there actually is an answer at the end.
This is all a bit of a sad story. It is clearly completely mad that this:
URI base = new URI("http", "example.org", null, null);
URI link = new URI(null, null, "index.html", null);
System.out.println(base.resolve(link));
Should print:
http://example.orgindex.html
Rather than:
http://example.org/index.html
And yet it does. Why? Because java.net.URI
...
represents a URI reference as defined by RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax
And does so faithfully. In particular, the resolve
method ...
constructs a new hierarchical URI in a manner consistent with RFC 2396, section 5.2
Sadly, the algorithm specified in section 5.2 is wrong. Specifically, although it says that ...
the path component is never undefined, though it may be empty
It does not ensure that the result of resolving a relative URI against a base URI which has an empty path is a valid URI. The problem is in step 6, which deals with the merging of the paths from the base and the relative URI into a buffer which will be used to form the resolved URI. The first two sub-steps of step 6 are:
a) All but the last segment of the base URI's path component is copied to the buffer. In other words, any characters after the last (right-most) slash character, if any, are excluded.
b) The reference's path component is appended to the buffer string.
If the base URI has an empty path, then after sub-step a, the buffer will be empty. If the relative URI has a path not starting with /, then after sub-step b, the buffer will contain a string not starting with /. The following steps deal with dot normalisation, and do nothing to add a leading /. The final step is:
h) The remaining buffer string is the reference URI's new path component.
So, the resolved URI has a path which does not start with /. Step 7 then builds this into the final string form of the resolved URI without any provisions for inserting a /. And so, resolution of a relative URI without a leading / against a base URI with an empty path results in nonsense. This is what RFC 2396 specifies, and what java.net.URI
does.
Whoops!
The story doesn't quite end there. In January 2005, RFC 3986 was published. This obsoleted RFC 2396, and contains a new definition of URI resolution, again in section 5.2. This definition is completely rewritten in a more rigorous (or at least rigorous-looking) style, and specifies the merging of paths in section 5.2.3, which starts off by getting this right:
If the base URI has a defined authority component and an empty path, then return a string consisting of "/" concatenated with the reference's path
So, this whole problem would be fixed if Java was updated to conform to an eight-year-old RFC, rather than a fourteen-year-old one. Doing just that is what is asked for in bug 6791060, which was opened in 2009, and last touched in 2010. Sun, i am disappoint.
Anyway, with this understanding in hand, we can see that the right solution is something like:
public static URI fix(URI uri) {
if (uri.getPath().isEmpty()) {
try {
return new URI(uri.getScheme(), uri.getAuthority(), "/", uri.getQuery(), uri.getFragment());
}
catch (URISyntaxException e) {
AssertionError ae = new AssertionError("highly implausible error fixing URI " + uri);
ae.initCause(e);
throw ae;
}
}
else {
return uri;
}
}
fix(new URI(baseurl)).resolve(link);
Upvotes: 1
Reputation: 168825
Use URI.normalize()
.
import java.net.*;
class TestURL {
public static void main(String[] args) throws Exception {
String s = "http://www.site.com/./";
URL url = new URL(s);
System.out.println(url);
URI uri = url.toURI();
System.out.println(uri.normalize().toURL());
}
}
http://www.site.com/./
http://www.site.com/
Upvotes: 1
Reputation: 22904
Maybe this will work?
new URI(baseUrl).resolve(link).toURL()
The java.net.URI has a resolve method that might be what you're looking for and toURL to get it into a URL.
EDIT
The following seems to work for me..
import java.net.URL;
public class UrlTest {
private static URL resolve(URL base, String link) throws Exception {
if (base.getPath().isEmpty()) {
link = "/" + link;
}
URL u1 = base.toURI().resolve(link).normalize().toURL();
return u1;
}
private static void resolveUrls(URL baseUrl) throws Exception {
String link = "abcd";
String link2 = "./";
String link3 = "./foo";
System.out.println(resolve(baseUrl, link));
System.out.println(resolve(baseUrl, link2));
System.out.println(resolve(baseUrl, link3));
}
public static void main(String[] args) throws Exception {
String baseUrlStr = "http://www.somesite.com";
URL baseUrl = new URL(baseUrlStr);
resolveUrls(baseUrl);
baseUrl = new URL(baseUrlStr + "/index.html");
resolveUrls(baseUrl);
baseUrl = new URL(baseUrlStr + "/path/index.html");
resolveUrls(baseUrl);
}
}
Upvotes: 1
Reputation: 564
You can try this
new URL(new URL(baseurl), link.replace("./"), "");
Upvotes: 1