kush
kush

Reputation: 16928

Regex to extract subdomain from URL?

I have a bunch of domain names coming in like this:

http://subdomain.example.com (example.com is always example.com, but the subdomain varies).

I need "subdomain".

Could some kind person who had the patience to learn regex help me out?

Upvotes: 19

Views: 39458

Answers (7)

darul75
darul75

Reputation: 353

To math sub domains with dot character in it, I used this one

https?:\/\/?(?:([^*]+)\.)?domain\.com

to get all matching characters after protocol until domain.

https://sub.domain.com (sub)

https://sub.sub.domain.com (sub.sub) ...

Upvotes: 0

Pandem1c
Pandem1c

Reputation: 878

The problem with the above regex is: if you do not know what the protocol is, or what the domain suffix is, you will get some unexpected results. Here is a little regex accounts for those situations. :D

/(?:http[s]*\:\/\/)*(.*?)\.(?=[^\/]*\..{2,5})/i  //javascript

This should always return your subdomain (if present) in group 1. Here it is in a Javascript example, but it should also work for any other engine that supports positive look-ahead assertions:

// EXAMPLE of use
var regex = /(?:http[s]*\:\/\/)*(.*?)\.(?=[^\/]*\..{2,5})/i
  , whoKnowsWhatItCouldBe = [
                        "www.mydomain.com/whatever/my-site" //matches: www
                      , "mydomain.com"// does not match
                      , "http://mydomain.com" // does not match
                      , "https://mydomain.com"// does not match
                      , "banana.com/somethingelse" // does not match
                      , "https://banana.com/somethingelse.org" // does not match
                      , "http://what-ever.mydomain.mu" //matches: what-ever
                      , "dev-www.thisdomain.com/whatever" // matches: dev-www
                      , "hot-MamaSitas.SomE_doma-in.au.xxx"//matches: hot-MamaSitas
                  , "http://hot-MamaSitas.SomE_doma-in.au.xxx" // matches: hot-MamaSitas
                  , "пуст.пустыня.ru" //even non english chars! Woohoo! matches: пуст
                  , "пустыня.ru" //does not match
                  ];

// Run a loop and test it out.
for ( var i = 0, length = whoKnowsWhatItCouldBe.length; i < length; i++ ){
    var result = whoKnowsWhatItCouldBe[i].match(regex);
    if(result != null){
      // YAY! We have a match!
    } else {
      // Boo... No subdomain was found
    }
}

Upvotes: 52

Sinan &#220;n&#252;r
Sinan &#220;n&#252;r

Reputation: 118166

#!/usr/bin/perl

use strict;
use warnings;

my $s = 'http://subdomain.example.com';
my $subdomain = (split qr{/{2}|\.}, $s)[1];

print "'$subdomain'\n";

Upvotes: 0

czuk
czuk

Reputation: 6408

1st group of

http://(.*).example.com

Upvotes: -1

Draemon
Draemon

Reputation: 34721

/(http:\/\/)?(([^.]+)\.)?domain\.com/

Then $3 (or \3) will contain "subdomain" if one was supplied.

If you want to have the subdomain in the first group, and your regex engine supports non-capturing groups (shy groups), use this as suggested by palindrom:

/(?:http:\/\/)?(?:([^.]+)\.)?domain\.com/

Upvotes: 24

Factor Mystic
Factor Mystic

Reputation: 26790

Purely the subdomain string (result is $1):

^http://([^.]+)\.domain\.com

Making http:// optional (result is $2):

^(http://)?([^.]+)\.domain\.com

Making the http:// and the subdomain optional (result is $3):

(http://)?(([^.]+)\.)?domain\.com

Upvotes: 6

Jeremy Salwen
Jeremy Salwen

Reputation: 8428

It should just be

\Qhttp://\E(\w+)\.domain\.com

The sub domain will be the first group.

Upvotes: 2

Related Questions