lockone
lockone

Reputation: 93

Standard URL Normalization - Java

I would like to ask if there's any Java package or library that have the standard URL normalization?

5 Components of URL Representation

http://www[dot]example[dot]com:8040/folder/exist?name=sky#head

  1. scheme: http
  2. authority: www.example.com:8040
  3. path: /folder/exist
  4. query: ?name=sky
  5. fragment: #head

The 3 types of standard URL normalization

Syntax-Based Normalization

Scheme-Based Normalization

Protocol-Based Normalization

Upvotes: 9

Views: 9870

Answers (3)

Alain O'Dea
Alain O'Dea

Reputation: 21696

URI uri = URI.create("http://www.example.com:8040/folder/exist?name=sky#head");
String scheme = uri.getScheme();
String authority = uri.getAuthority();
// ...

https://docs.oracle.com/javase/1.5.0/docs/api/java/net/URI.html

Upvotes: 3

corsiKa
corsiKa

Reputation: 82579

What about java.net.URL set()?

Upvotes: 0

David J.
David J.

Reputation: 32715

As others have mentioned, java.net.URL and/or java.net.URI are some obvious starting points.

Here some other options:

  1. Galimatias (Spanish for "gibberish") appears to be an opinionated and relatively popular URL normalization library for Java. The source code can be found at github.com/smola/galimatias.

    galimatias started out of frustration with java.net.URL and java.net.URI. Both of them are good for basic use cases, but severely broken for others

  2. The github.com/sentric/url-normalization library provides another (unusual, in my opinion) approach where it reverses the domain portion; e.g. "com.stackoverflow" instead of "stackoverflow.com".

You can find other variations, sometimes implemented in languages such as Python, Ruby, and PHP on Github.

Upvotes: 5

Related Questions