Reputation: 15996
I'm trying to find a method to take a URI/URL string from a user and determine a working, canonical form (or failing if the resource isn't valid). Simultaneously, it should also verify that the URL exists. So we're checking for both valid "syntax" and also existence.
For instance, a string like google.com
should be turned into http://www.google.com
, and a string like google.com/insights
should be turned into http://www.google.com/insights
. A string like http://thiswebsitedoesntexistatall.com
should return some sort of error or exception.
I believe a portion of the solution may likely be calling an HTTP get_response()
method and following redirects until I get a 200 OK
status.
It seems like the URI.parse()
method is not forgiving of leaving off the http
. I realize I could write a simple thing to try adding http
in front, etc., but I was hoping there was some existing gem or little-known library function that would be really forgiving about URLs and canonicalize them for me.
Both the built in net/http
and HTTParty
seem to be too strict for what I'm looking for. Is there a nice way to do this?
Upvotes: 4
Views: 672
Reputation: 160631
There are some problems with what you're asking for:
I'd recommend you look at the Addressable::URI gem. It's much more full-featured than Ruby's URI. It won't make the decisions for you, but at least it will give you a more complete API and can rewrite/normalize URLs. Cleaning them up and/or determining if they are good is still left as an exercise for you.
Upvotes: 3