user1992313
user1992313

Reputation: 31

In Java, how to get canonicalized url

Say i have space in a url, what is the right way to convert it to %20? no 'replace' suggestion please.

For example, if you put "http://test.com/test and test/a" into the browser window, it converts to http://test.com/test%20and%20test/a

If I use URLEncoder, I get even the / converted. which is not what i want.

Thanks,

this is the right way, seems like. to add to the question, what if there is also some non ascii code in the path that I want convert to valid url with utf8 encode? e.g.: test.com:8080/test and test/pierlag2_carré/a?query=世界 I'd want it to be converted to test.com:8080/test%20and%20test/pierlag2_carr%C3%A9/a?query=%E4%B8%96%E7%95%8C

Upvotes: 2

Views: 4399

Answers (3)

gregwhitaker
gregwhitaker

Reputation: 13420

The correct way to build URLs in Java is to create a URI object and fill out each part of the URL. The URI class handles the encoding rules for the distinct parts of the URL as they differ from one to the next.

URLEncoder is not what you want, despite its name, as that actually does HTML form encoding.

EDIT:

Based on your comments, you are receiving the URL as input to your application and do not control the initial generation of the URL. The real problem you are currently experiencing is that the input you are receiving, the URL, is not a valid URL. URLs / URIs cannot contain spaces per the spec (hence the %20 in the browser).

Since you have no control over the invalid input you are going to be forced to split the incoming URL into its parts:

  • scheme
  • host
  • path

Then you are going to have to split the path and separately encode each part to ensure that you do not inadvertently encode the / that delimits your path fragments.

Finally, you can put all of them back together in a URI object and then pass that around your application.

Upvotes: 3

acdcjunior
acdcjunior

Reputation: 135832

Try splitting into a URI with the aid of the URL class:

String sUrl = "http://test.com:8080/test and test/a?query=world";
URL url = new URL(sUrl);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String canonical = uri.toString();
System.out.println(canonical);

Output:

http://test.com:8080/test%20and%20test/a?query=world

Upvotes: 5

amatellanes
amatellanes

Reputation: 3735

You may find useful this code to replace blank spaces in your URL:

String myUrl = "http://test.com/test and test/a";
myUrl = myUrl.replaceAll(" ", "%20");

URI url = new URI(myUrl);
System.out.print(url.toString());

Upvotes: -3

Related Questions