Reputation: 16928
I have a bunch of domain names coming in like this:
http://subdomain.example.com (example.com is always example.com, but the subdomain varies).
I need "subdomain".
Could some kind person who had the patience to learn regex help me out?
Upvotes: 19
Views: 39458
Reputation: 353
To math sub domains with dot character in it, I used this one
https?:\/\/?(?:([^*]+)\.)?domain\.com
to get all matching characters after protocol until domain.
https://sub.domain.com (sub)
https://sub.sub.domain.com (sub.sub) ...
Upvotes: 0
Reputation: 878
The problem with the above regex is: if you do not know what the protocol is, or what the domain suffix is, you will get some unexpected results. Here is a little regex accounts for those situations. :D
/(?:http[s]*\:\/\/)*(.*?)\.(?=[^\/]*\..{2,5})/i //javascript
This should always return your subdomain (if present) in group 1. Here it is in a Javascript example, but it should also work for any other engine that supports positive look-ahead assertions:
// EXAMPLE of use
var regex = /(?:http[s]*\:\/\/)*(.*?)\.(?=[^\/]*\..{2,5})/i
, whoKnowsWhatItCouldBe = [
"www.mydomain.com/whatever/my-site" //matches: www
, "mydomain.com"// does not match
, "http://mydomain.com" // does not match
, "https://mydomain.com"// does not match
, "banana.com/somethingelse" // does not match
, "https://banana.com/somethingelse.org" // does not match
, "http://what-ever.mydomain.mu" //matches: what-ever
, "dev-www.thisdomain.com/whatever" // matches: dev-www
, "hot-MamaSitas.SomE_doma-in.au.xxx"//matches: hot-MamaSitas
, "http://hot-MamaSitas.SomE_doma-in.au.xxx" // matches: hot-MamaSitas
, "пуст.пустыня.ru" //even non english chars! Woohoo! matches: пуст
, "пустыня.ru" //does not match
];
// Run a loop and test it out.
for ( var i = 0, length = whoKnowsWhatItCouldBe.length; i < length; i++ ){
var result = whoKnowsWhatItCouldBe[i].match(regex);
if(result != null){
// YAY! We have a match!
} else {
// Boo... No subdomain was found
}
}
Upvotes: 52
Reputation: 118166
#!/usr/bin/perl
use strict;
use warnings;
my $s = 'http://subdomain.example.com';
my $subdomain = (split qr{/{2}|\.}, $s)[1];
print "'$subdomain'\n";
Upvotes: 0
Reputation: 34721
/(http:\/\/)?(([^.]+)\.)?domain\.com/
Then $3 (or \3) will contain "subdomain" if one was supplied.
If you want to have the subdomain in the first group, and your regex engine supports non-capturing groups (shy groups), use this as suggested by palindrom:
/(?:http:\/\/)?(?:([^.]+)\.)?domain\.com/
Upvotes: 24
Reputation: 26790
Purely the subdomain string (result is $1):
^http://([^.]+)\.domain\.com
Making http://
optional (result is $2):
^(http://)?([^.]+)\.domain\.com
Making the http://
and the subdomain optional (result is $3):
(http://)?(([^.]+)\.)?domain\.com
Upvotes: 6
Reputation: 8428
It should just be
\Qhttp://\E(\w+)\.domain\.com
The sub domain will be the first group.
Upvotes: 2