Jeriho
Jeriho

Reputation: 7299

Converting and validating url from untrusted source

I'm parsing web page and collecting hrefs. Because web page is untrusted source it can hold links with invalid syntax or non-ascii symbols. So, as I understand, I need

1) convert spaces and non-ascii symbols and other symbols

2) validate string that was produced by step 1 (validness criteria: this url can be typed in browser and it will be able to retrieve page represented by url, such url can be constructed by URL/URI constructors and than appropriate page retrieved - I can type some urls in firefox but can't construct instances in java)

3) construct java.net.URL/URI from (1) if it is valid

I had found two validation libraries: 1 and 2 (which one do you prefer?) but no adequate library for first clause (tools like java.net.URLDecoder/URLEncoder) aren't intended for this purpose.

Upvotes: 1

Views: 145

Answers (1)

Vincent Koeman
Vincent Koeman

Reputation: 751

Can't you just try to make an URL/URI from it in a try/catch statement? I think that class' constructor handles the validation automatically

Upvotes: 1

Related Questions