While True
While True

Reputation: 423

Java's URL not parsing string properly

I'm creating an URL variable:

URL inputURL = null;
try {
    inputURL = new URL(inputUrlString);
} catch (MalformedURLException e) {
    Log.e(TAG, "Bad Parsing.");
    e.printStackTrace();

    AlertDialog ad = new AlertDialog.Builder(this)
            .setTitle("Error")
            .setMessage("URL is not HTTP-like url.")
            .setCancelable(true).create();
    ad.show();
}

if inputUrlString is "http:","http:/" or "http:/rubbish" it parses it like it's ok, goes further and crushes everything. Is it really a valid URL? Is a good practice of parsing it is through Pattern class?

Upvotes: 0

Views: 2548

Answers (5)

Kevin J. Chase
Kevin J. Chase

Reputation: 3956

You have two problems, only one of which you've already encountered.

1. Don't use URL!

The URL class does some weird and unexpected things that you basically never want. For example, the URL.equals method states (emphasis mine):

Two hosts are considered equivalent if both host names can be resolved into the same IP addresses [...]

Since hosts comparison requires name resolution, this operation is a blocking operation.

Note: The defined behavior for equals is known to be inconsistent with virtual hosting in HTTP.

Use URI instead. It's docs describe a few other shortcomings of the URL class, including:

  • Not all URIs can be represented as URLs:

    • URLs must be absolute (start with a "scheme:").

    • You can't create a URL for a scheme that doesn't already have a (stream) handler.

  • Comparison is not defined.

  • URL.equals and URL.hashCode both block while they consult the Internet.

  • Object equality (and hash codes) can vary based on your DNS setup... Two "equal" URL objects on one machine might be un-equal on another.

Yikes.

2. Your expectations are wrong.

There is nothing really wrong with a URI like "http:sdfasdfasdfas". It will even work in many browsers... if you happen to have a local host named "sdfasdfasdfas", and it serves Web pages.

The URI class docs, under "URI syntax and components", define URIs as made up of the following parts:

[scheme:]scheme-specific-part[#fragment]

Your example "http:sdfasdfasdfas" has a scheme, making it an "absolute URI". It also has a scheme-specific part, but no fragment. Regarding the scheme-specific part...

An opaque URI is an absolute URI whose scheme-specific part does not begin with a slash character ('/'). Opaque URIs are not subject to further parsing. Some examples of opaque URIs are:

Your example is an opaque URI, and its scheme-specific part may be almost anything, including that weird "hostname".

Your other examples are also valid URIs, with one exception:

  • "http:" would be an absolute opaque URI, but it's missing the required scheme-specific part. ("" isn't good enough).

  • "http:/" is an absolute hierarchical URI with scheme "http:" and path "/".

  • "http:/rubbish" is the same, but with the path "/rubbish".

If you wanted the URI class (or the URL class, if you insist) to verify opaque URIs for you, it would have to "know" how valid scheme-specific parts are defined for all schemes... including ones that don't exist yet.

Conclusion

You can declare valid URIs like your example(s) to be invalid if you really want, but you'll probably have to code something of your own to throw a MalformedURLException, or preferably your own more specific exception.

I think you'd be better off accepting the definition of "URI" that the rest of the world uses, and spending your time fixing whatever code is choking on valid URIs.

Upvotes: 1

Yassin Hajaj
Yassin Hajaj

Reputation: 21975

As you may see, URL object's have a constructor that is called when using URL(String) and that is

URL(URL, String, URLStreamHandler)

Within this constructor, you have a test to check if the String entered contains a : and if what happens before the : is a known protocol. See below for the code


CODE

The following portion checks, as you may see, the existence of ':'. When finding it, it checks, by the method isValidProtocol if the text before is a valid known protocol. That is why http: is a valid String for the constructor.

540                 for (i = start ; !aRef && (i < limit) &&
541                      ((c = spec.charAt(i)) != '/') ; i++) {
542                 if (c == ':') {
543 
544                     String s = spec.substring(start, i).toLowerCase();
545                     if (isValidProtocol(s)) {
546                         newProtocol = s;
547                         start = i + 1;
548                     }
549                     break;
550                 }

isValidProtocol method

623     /*
624      * Returns true if specified string is a valid protocol name.
625      */
626     private boolean isValidProtocol(String protocol) {
627         int len = protocol.length();
628         if (len < 1)
629             return false;
630         char c = protocol.charAt(0);
631         if (!Character.isLetter(c))
632             return false;
633         for (int i = 1; i < len; i++) {
634             c = protocol.charAt(i);
635             if (!Character.isLetterOrDigit(c) && c != '.' && c != '+' &&
636                 c != '-') {
637                 return false;
638             }
639         }
640         return true;
641     }

Source

Upvotes: 0

Kip
Kip

Reputation: 558

Is a good practice of parsing it is through Pattern class?

I guess that depends where inputUrlString is coming from. If it's something a user is inputting, it's always a good idea to scrub it.

Upvotes: 0

Dominik Reinert
Dominik Reinert

Reputation: 895

Separatly parsing a URL only seems to make sense if you (e.g.) want to see if it is an email-adress. You can't tell Java to 'look' if you/the user entered rubbish. You could just catch the exception, that is thrown, if the browser/whatever tries to access it.

See the oracle documentation on how to use URL in Java.

Have a look at this post, maybe this is what you are looking for.

Upvotes: 1

Aaron
Aaron

Reputation: 24802

Throws:
MalformedURLException - if no protocol is specified, or an unknown protocol is found, or spec is null.

As you can see in the URL javadoc the constructor itself is quite lenient.

You could use apache common's UrlValidator, or just watch out for errors when using the URL.

Upvotes: 1

Related Questions