dannix
dannix

Reputation: 235

Perl extract domain name from email address inc tld but excluding subdomains

I'm trying to do what the title says and I've got this:

sub getDomain {

    my $scalarRef = shift;
    my @from_domain = split(/\@/,$$scalarRef);

    if($from_domain[1] =~ m/^.*?(\w+\.\w+)$/){
       print "$from_domain[1] $1" if($username eq 'xxx');
       return $1;
    }
}

Works fine for [email protected] returning domain.com, but of course domain.co.uk will return .co.uk and I need domain.co.uk. Any suggestions on how to proceed with this one, I'm guessing a module and some suggest some kind of tld lookup table.

Upvotes: 1

Views: 2702

Answers (2)

Dave Cross
Dave Cross

Reputation: 69264

I think you're out of luck here. Net::Domain::TLD will give you a list of TLDs, but that's not actually what you want.

As I understand it, given an email address like [email protected], you want to get domain.com. The TLD here is "com" and you want the TLD and the section of the domain that comes before it. That's easy.

And then there's [email protected]. Here the TLD is "uk". But here you don't want the TLD and the section of the domain that precedes it - you want two sections before the TLD.

So perhaps you need a heuristic. If the TLD is three letters long, take the previous section of the domain, and if the TLD is three letters long, take the previous two sections.

But that doesn't work either. Not all ccTLDs have defined subdomains like .uk does. Take, for example, the popular .tv ccTLD. They allow you to register a domain directly under the ccTLD.

So you don't just need a list of TLDs. You also need to understand the rules that each of the TLDs apply to registrations. And they could change over time. And new TLDs are being introduced - you'd need to keep up with all of those.

Oh, and one last point. Even big ccTLDs like .uk don't always follow their own rules. There are a few .uk domains that don't have a top-level subdomain - .british-library.for example.

You might be able to implement this for a sub-set of domains that you're particularly interested in. But a full solution would be incredibly complex and almost impossible to keep up to date.

Upvotes: 2

David-SkyMesh
David-SkyMesh

Reputation: 5171

Don't use a RegExp.

use Email::Address;
my ($addr) = Email::Address->parse('[email protected]');
print "Domain: ".$addr->host."\n";
print "User:   ".$addr->user."\n";

Prints:

Domain: domain.co.uk
User:   foo

Upvotes: 10

Related Questions