Reputation: 423
I'm creating an URL variable:
URL inputURL = null;
try {
inputURL = new URL(inputUrlString);
} catch (MalformedURLException e) {
Log.e(TAG, "Bad Parsing.");
e.printStackTrace();
AlertDialog ad = new AlertDialog.Builder(this)
.setTitle("Error")
.setMessage("URL is not HTTP-like url.")
.setCancelable(true).create();
ad.show();
}
if inputUrlString
is "http:"
,"http:/"
or "http:/rubbish"
it parses it like it's ok, goes further and crushes everything. Is it really a valid URL? Is a good practice of parsing it is through Pattern class?
Upvotes: 0
Views: 2548
Reputation: 3956
You have two problems, only one of which you've already encountered.
1. Don't use URL
!
The URL
class does some weird and unexpected things that you basically never want. For example, the URL.equals
method states (emphasis mine):
Two hosts are considered equivalent if both host names can be resolved into the same IP addresses [...]
Since hosts comparison requires name resolution, this operation is a blocking operation.
Note: The defined behavior for equals is known to be inconsistent with virtual hosting in HTTP.
Use URI
instead. It's docs describe a few other shortcomings of the URL
class, including:
Not all URIs can be represented as URLs:
URLs must be absolute (start with a "scheme:").
You can't create a URL
for a scheme that doesn't already have a (stream) handler.
Comparison is not defined.
URL.equals
and URL.hashCode
both block while they consult the Internet.
Object equality (and hash codes) can vary based on your DNS setup... Two "equal" URL
objects on one machine might be un-equal on another.
Yikes.
2. Your expectations are wrong.
There is nothing really wrong with a URI like "http:sdfasdfasdfas". It will even work in many browsers... if you happen to have a local host named "sdfasdfasdfas", and it serves Web pages.
The URI
class docs, under "URI syntax and components", define URIs as made up of the following parts:
[scheme
:
]scheme-specific-part[#
fragment]
Your example "http:sdfasdfasdfas" has a scheme, making it an "absolute URI". It also has a scheme-specific part, but no fragment. Regarding the scheme-specific part...
An opaque URI is an absolute URI whose scheme-specific part does not begin with a slash character (
'/'
). Opaque URIs are not subject to further parsing. Some examples of opaque URIs are:
- mailto:[email protected]
- news:comp.lang.java
- urn:isbn:096139210x
Your example is an opaque URI, and its scheme-specific part may be almost anything, including that weird "hostname".
Your other examples are also valid URIs, with one exception:
"http:" would be an absolute opaque URI, but it's missing the required scheme-specific part. ("" isn't good enough).
"http:/" is an absolute hierarchical URI with scheme "http:" and path "/".
"http:/rubbish" is the same, but with the path "/rubbish".
If you wanted the URI
class (or the URL
class, if you insist) to verify opaque URIs for you, it would have to "know" how valid scheme-specific parts are defined for all schemes... including ones that don't exist yet.
Conclusion
You can declare valid URIs like your example(s) to be invalid if you really want, but you'll probably have to code something of your own to throw a MalformedURLException
, or preferably your own more specific exception.
I think you'd be better off accepting the definition of "URI" that the rest of the world uses, and spending your time fixing whatever code is choking on valid URIs.
Upvotes: 1
Reputation: 21975
As you may see, URL
object's have a constructor that is called when using URL(String)
and that is
URL(URL, String, URLStreamHandler)
Within this constructor, you have a test to check if the String entered contains a :
and if what happens before the :
is a known protocol. See below for the code
The following portion checks, as you may see, the existence of ':'
. When finding it, it checks, by the method isValidProtocol
if the text before is a valid known protocol. That is why http:
is a valid String
for the constructor.
540 for (i = start ; !aRef && (i < limit) &&
541 ((c = spec.charAt(i)) != '/') ; i++) {
542 if (c == ':') {
543
544 String s = spec.substring(start, i).toLowerCase();
545 if (isValidProtocol(s)) {
546 newProtocol = s;
547 start = i + 1;
548 }
549 break;
550 }
isValidProtocol
method623 /*
624 * Returns true if specified string is a valid protocol name.
625 */
626 private boolean isValidProtocol(String protocol) {
627 int len = protocol.length();
628 if (len < 1)
629 return false;
630 char c = protocol.charAt(0);
631 if (!Character.isLetter(c))
632 return false;
633 for (int i = 1; i < len; i++) {
634 c = protocol.charAt(i);
635 if (!Character.isLetterOrDigit(c) && c != '.' && c != '+' &&
636 c != '-') {
637 return false;
638 }
639 }
640 return true;
641 }
Upvotes: 0
Reputation: 558
Is a good practice of parsing it is through Pattern class?
I guess that depends where inputUrlString
is coming from. If it's something a user is inputting, it's always a good idea to scrub it.
Upvotes: 0
Reputation: 895
Separatly parsing a URL only seems to make sense if you (e.g.) want to see if it is an email-adress. You can't tell Java to 'look' if you/the user entered rubbish. You could just catch the exception, that is thrown, if the browser/whatever tries to access it.
See the oracle documentation on how to use URL in Java.
Have a look at this post, maybe this is what you are looking for.
Upvotes: 1
Reputation: 24802
Throws:
MalformedURLException - if no protocol is specified, or an unknown protocol is found, or spec is null.
As you can see in the URL javadoc the constructor itself is quite lenient.
You could use apache common's UrlValidator, or just watch out for errors when using the URL.
Upvotes: 1