strangeQuirks
strangeQuirks

Reputation: 5920

Regular Expression for email subdomain match

Anyone know of a regular expression that will only match emails that contain a sub-domain in them?

e.g.,

[email protected] or [email protected]

Preferably to use in .

I tried this:

^[-+.0-9A-Z_a-z]+@[-+.0-9A-Z_a-z]+\.[A-Za-z]{2,4}$

but it also matches [email protected].

Upvotes: 3

Views: 8826

Answers (6)

Naveed
Naveed

Reputation: 602

You should write regex to match the subdomain along with domain name and the dot following it. Like this:

(([a-z0-9]+\.)*[a-z0-9]{2,}\.)

For matching entire email id, this is the regex:

([a-z]+[a-z0-9]*[_\.]?[a-z0-9]+)@(([a-z0-9]+\.)*[a-z0-9]{2,}\.)+[a-z]{2,}

naveed@comquest:~$ echo -e "[email protected]\[email protected]\[email protected]" | grep -E "([a-z]+[a-z0-9]*[_\.]?[a-z0-9]+)@(([a-z0-9]+\.)*[a-z0-9]{2,}\.)+[a-z]{2,}"
[email protected]
[email protected]
[email protected]

You can find a detailed explanation here

Upvotes: 0

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656351

This simple regular expression does not guarantee valid email-addresses, but it eliminates much of the nonsense reliably: If the expression yields FALSE, the address is actually invalid:

SELECT '[email protected]' ~ E'^\\S+@subdomain\\.\\S{2,}+$' 
^  .. start of string
\S+ .. one or more non-space characters
@subdomain .. literally
\. .. a literal dot
\S{2,}+ two or more non-space characters
$ .. end of string

All \ doubled for escape string syntax.
And, unlike some other answer, it works in PostgreSQL. Tested with v9.1.4. Details in the manual here.

Like @Craig wrote: it's futile to attempt reliable validation. But you can still eliminate much nonsense.

One step further, eliminate multiple @:

E'^[^[:space:]@]+@subdomain\\.[^[:space:]@]{2,}+$' 

Upvotes: 2

atiruz
atiruz

Reputation: 2858

I think you could do it yourself, trying on the website:

Regex Tester http://regexpal.com/

You can try online...

Regards,
Victor Zurita M.

Upvotes: 0

Craig Ringer
Craig Ringer

Reputation: 324355

Don't, not for validation purposes anyway. It'll only end in pain.

The only reasonable regular expression for validating an email address is one that looks for a "@" symbol and at least one period. Nothing else; even alphanumerics are pointless with the advent of IDNs.

At minimum you need to define exactly what you mean by "subdomain". Everything is a subdomain. A subdomain of what? What is excluded and what is included?

How do you define "subdomain" vs "top level"? Do you mean "a subdomain of a domain that is open to public registration" ? "A subdomain of a subdomain of a domain that is open to public registration" ? At what level of delegation does it become a subdomain for your purposes?

What about government domains, where the "public" that can register domains is very limited, and subdomains-of-subdomains-of-subdomains are the norm? What do you want to match?

How will you cope with the new gTLDs and the fact that the list will change with time? Or with the addition/removal of ccTLDs? What about if a ccTLD changes its policy, beginning to sell direct descendant domains (eg "myname.au") instead of only selling specific sub-registries (eg "myname.org.au")? Will you be dynamically updating your regex, and if so how will you handle addresses that used to be valid and are no longer, or vice versa?

I run into idiotic email validation systems that even reject my main email address [email protected] (no point munging it when it's already all over the 'net) despite it being an entirely valid .id.au domain.

Please don't create another one. If your intent isn't validation, that's cool, but please don't try to validate email address domains with a regex.

Upvotes: 8

Ria
Ria

Reputation: 10367

use this one:

(\w+@[\w.]+\w)

explain:

\w+      word characters (a-z, A-Z, 0-9, _) 
         (1 or more times (matching the most amount possible))

@                        '@'

[\w.]+   any character of: word characters (a-z, A-Z, 0-9, _), '.' 
         (1 or more times (matching the most amount possible))

\w       word characters (a-z, A-Z, 0-9, _)

and for PostgreSql see this link, and this. (seem be impossible).

Upvotes: 3

Naren Karthik
Naren Karthik

Reputation: 349

You need a list if all top-level domains and their structure. The Mozilla project has such a list; it is several hundred lines, so incorporating it into a regex may be cumbersome, although certainly not impossible. https://wiki.mozilla.org/TLD_List update: superseded by http://publicsuffix.org/

Basically it is a link parser. It needs to look in text (from the database), find any text that matches email addresses or URL and turn them into links

Upvotes: 0

Related Questions