jesica
jesica

Reputation: 685

How to use regex for extract domain parts from string

I have a URL string like "https://example.com". I want to show the parts of this URL like protocol, domain, and extension. How can I do this using regular expression?

Upvotes: 0

Views: 361

Answers (2)

fool-dev
fool-dev

Reputation: 7777

In the Ruby I have used something like this

user:~/workspace $ irb
2.3.4 :018 > url = "https://example.com"
 => "https://.example.com" 
2.3.4 :019 > u = url.match(/(?<protocol>[\w]+):\/\/(?<domain>[\w-]+)\.(?<extension>\w+)/)
 => #<MatchData "https://example.com" protocol:"https" domain:"example" extension:"com"> 
2.3.4 :020 > u[:protocol]
 => "https" 
2.3.4 :021 > u[:domain]
 => "example" 
2.3.4 :022 > u[:extension]
 => "com" 

If you have also subdomain then use like below regular expression

2.3.4 :034 > url = "https://sub.example.com"    
2.3.4 :035 > u = url.match(/(?<protocol>[\w]+):\/\/(?<domain>[[a-zA-Z0-9]\.-]+)\.(?<extension>\w+)/)
 => #<MatchData "https://sub.example.com" protocol:"https" domain:"sub.example" extension:"com"> 
2.3.4 :036 > u[:protocol]
 => "https" 
2.3.4 :037 > u[:domain]
 => "sub.example" 
2.3.4 :038 > u[:extension]
 => "com" 

In the http://rubular.com/ I have created a snippet for testing regular expression which not failing with subdomain see this Rubular

Upvotes: 1

bo-oz
bo-oz

Reputation: 2872

You could easily use a ruby built-in class for this:

uri = URI("http://www.example.com")
uri.scheme // http
uri.host // www.example.com

See also: http://ruby-doc.org/stdlib-2.0.0/libdoc/uri/rdoc/URI.html

Upvotes: 2

Related Questions