watzon
watzon

Reputation: 2549

Best regular expression for matching the domain part of emails

I am trying to make a RegEx that can match the domain portion of an email address. Right now I have to use two of them, one that gets all the email addresses and then another that matches the domain, but I'm still having issues.

Right now the code I have is this:

var email_ex = /[a-zA-Z0-9]+(?:(\.|_)[A-Za-z0-9!#$%&'*+/=?^`{|}~-]+)*@(?!([a-zA-Z0-9]*\.[a-zA-Z0-9]*\.[a-zA-Z0-9]*\.))(?:[A-Za-z0-9](?:[a-zA-Z0-9-]*[A-Za-z0-9])?\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?/ig; // Match all email addresses on page
    email_ex = new RegExp(email_ex);

    var domain_ex = /[a-zA-Z0-9\-\.]+\.(com|org|net|mil|edu|COM|ORG|NET|MIL|EDU|CO\.UK|AU|LI|LY|IT|IO)/ig // Match all domains
    domain_ex = new RegExp(domain_ex);

    var match = document.body.innerText; // Location to pull our text from. In this case it's the whole body
    match = match.match(email_ex); // Run the RegExp on the body's textContent

I'd rather not have to have a list of TLD's, but I haven't been able to find an expression good enough

Upvotes: 0

Views: 294

Answers (5)

Alan Souza
Alan Souza

Reputation: 7795

+1 for @strah, the answer works great, but for this example "@example.domain" the return is "example.domain" where, in my opinion, should be null as it is not a valid email.

If you want to be extra strict about the email format, you can do as follows:

var r = /[^\s]+@([^\s]+)/;
r.exec("[email protected]")[1]; //outputs: testing.domain
r.exec("@testing.domain")[1]; //outputs: null

Upvotes: 1

user557597
user557597

Reputation:

You should be able to combine finding emails, and capturing the
domain part in a single operation and with a single regex.

Using a regex from the html5 specs as an example, but use yours
and just insert the capture group.

 # http://www.w3.org/TR/html5/forms.html#valid-e-mail-address
 # /[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@([a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*)/


 [a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+ 
 @
 (                                  # (1 start)
      [a-zA-Z0-9] 
      (?:
           [a-zA-Z0-9-]{0,61} 
           [a-zA-Z0-9] 
      )?
      (?:
           \. 
           [a-zA-Z0-9] 
           (?:
                [a-zA-Z0-9-]{0,61} 
                [a-zA-Z0-9] 
           )?
      )*
 )                                  # (1 end)

Upvotes: 0

Andreas Rau
Andreas Rau

Reputation: 31

If you don't want an Regex that finds a valid e-mail-adresse because u can predetermant that you have one (and if e-mail-adresses are one webpages they are mostly valid) u can use this:

Domain can't contain @'s for this u can consume all characters till the last @

(.*)@(.*)

and you can be sure u have your domain in the second group

Upvotes: 1

Mark
Mark

Reputation: 73

I agree you should not have a list of TLDs. Your regex is already missing many, and this is going to become a very long list as generic TLDs become more common. This should get you pretty close:

(?<=@)(?:[a-zA-Z0-9][-a-zA-Z0-9]*[a-zA-Z0-9]\.)+[a-zA-Z0-9]{2,}

Or commented:

(?<=@)                              (?# Check it is preceeded with @ )
(?:                                 (?# start of subdomain block )
[a-zA-Z0-9][-a-zA-Z0-9]*[a-zA-Z0-9] (?# subdomain )
\.)+                                (?# end of subdomain, including dot, repeats )
[a-zA-Z0-9]{2,}                     (?# TLD )

Upvotes: 0

strah
strah

Reputation: 6732

The simplest RegExp: /@([^\s]*)/

var email = "[email protected]";
var domain = email.match(/@([^\s]*)/)[1];

Upvotes: 4

Related Questions