Reputation: 33755
Say I have a string like this: "http://something.example.com/directory/"
What I want to do is to parse this string, and extract the "something"
from the string.
The first step, is to obviously check to make sure that the string contains "http://"
- otherwise, it should ignore the string.
But, how do I then just extract the "something"
in that string? Assume that all the strings that this will be evaluating will have a similar structure (i.e. I am trying to extract the subdomain of the URL - if the string being examined is indeed a valid URL - where valid is starts with "http://"
).
Thanks.
P.S. I know how to check the first part, i.e. I can just simply split the string at the "http://"
but that doesn't solve the full problem because that will produce "http://something.example.com/directory/"
. All I want is the "something"
, nothing else.
Upvotes: 22
Views: 35216
Reputation: 9065
with URI.parse you can get:
require "uri"
uri = URI.parse("http://localhost:3000")
uri.scheme # http
uri.host # localhost
uri.port # 3000
Upvotes: 2
Reputation: 160551
I'd do it this way:
require 'uri'
uri = URI.parse('http://something.example.com/directory/')
uri.host.split('.').first
=> "something"
URI is built into Ruby. It's not the most full-featured but it's plenty capable of doing this task for most URLs. If you have IRIs then look at Addressable::URI.
Upvotes: 40
Reputation: 3465
Well, you can use regular expressions.
Something like /http:\/\/([^\.]+)/
, that is, the first group of non '.' letters after http
.
Check out http://rubular.com/. You can test your regular expressions against a set of tests too, it's great for learning this tool.
Upvotes: 2
Reputation: 15010
You could use URI like
uri = URI.parse("http://something.example.com/directory/")
puts uri.host
# "something.example.com"
and you could then just work on the host.
Or there is a gem domainatrix
from Remove subdomain from string in ruby
require 'rubygems'
require 'domainatrix'
url = Domainatrix.parse("http://foo.bar.pauldix.co.uk/asdf.html?q=arg")
url.public_suffix # => "co.uk"
url.domain # => "pauldix"
url.subdomain # => "foo.bar"
url.path # => "/asdf.html?q=arg"
url.canonical # => "uk.co.pauldix.bar.foo/asdf.html?q=arg"
and you could just take the subdomain.
Upvotes: 9