Reputation: 581
Validating E-mail Ids according to RFC5322 and following
https://en.wikipedia.org/wiki/Email_address
Below is the sample code using java and a regular expression to validate E-mail Ids.
public void checkValid() {
List<String> emails = new ArrayList();
//Valid Email Ids
emails.add("[email protected]");
emails.add("[email protected]");
emails.add("[email protected]");
emails.add("[email protected]");
emails.add("[email protected]");
emails.add("[email protected]");
emails.add("[email protected]");
emails.add("[email protected]");
emails.add("carlosd'[email protected]");
emails.add("[email protected]");
emails.add("admin@mailserver1");
emails.add("[email protected]");
emails.add("\" \"@example.org");
emails.add("\"john..doe\"@example.org");
//Invalid emails Ids
emails.add("Abc.example.com");
emails.add("A@b@[email protected]");
emails.add("a\"b(c)d,e:f;g<h>i[j\\k][email protected]");
emails.add("just\"not\"[email protected]");
emails.add("this is\"not\\[email protected]");
emails.add("this\\ still\"not\\[email protected]");
emails.add("1234567890123456789012345678901234567890123456789012345678901234+x@example.com");
emails.add("[email protected]");
emails.add("[email protected]");
String regex = "^[a-zA-Z0-9_!#$%&'*+/=? \\\"`{|}~^.-]+@[a-zA-Z0-9.-]+$";
Pattern pattern = Pattern.compile(regex);
int i=0;
for(String email : emails){
Matcher matcher = pattern.matcher(email);
System.out.println(++i +"."+email +" : "+ matcher.matches());
}
}
Actual Output:
[email protected] : true
[email protected] : true
[email protected] : true
[email protected] : true
[email protected] : true
[email protected] : true
[email protected] : true
[email protected] : true
9.carlosd'[email protected] : true
[email protected] : true
11.admin@mailserver1 : true
[email protected] : true
13." "@example.org : true
14."john..doe"@example.org : true
15.Abc.example.com : false
16.A@b@[email protected] : false
17.a"b(c)d,e:f;g<h>i[j\k][email protected] : false
18.just"not"[email protected] : true
19.this is"not\[email protected] : false
20.this\ still"not\[email protected] : false
21.1234567890123456789012345678901234567890123456789012345678901234+x@example.com : true
[email protected] : true
[email protected] : true
Expected Ouput:
[email protected] : true
[email protected] : true
[email protected] : true
[email protected] : true
[email protected] : true
[email protected] : true
[email protected] : true
[email protected] : true
9.carlosd'[email protected] : true
[email protected] : true
11.admin@mailserver1 : true
[email protected] : true
13." "@example.org : true
14."john..doe"@example.org : true
15.Abc.example.com : false
16.A@b@[email protected] : false
17.a"b(c)d,e:f;g<h>i[j\k][email protected] : false
18.just"not"[email protected] : false
19.this is"not\[email protected] : false
20.this\ still"not\[email protected] : false
21.1234567890123456789012345678901234567890123456789012345678901234+x@example.com : false
[email protected] : false
[email protected] : false
How can I change my regular expression so that it will invalidate the below patterns of email ids.
1234567890123456789012345678901234567890123456789012345678901234+x@example.com
[email protected]
[email protected]
just"not"[email protected]
Below are the criteria for regular expression:
Local-part
The local-part of the email address may use any of these ASCII characters:
A to Z
and a to z
;0 to 9
;.
, provided that it is not the first or last character unless
quoted, and provided also that it does not appear consecutively
unless quoted (e.g. [email protected]
is not allowed but
"John..Doe"@example.com
is allowed);space
and "(),:;<>@[\]
characters are allowed with restrictions
(they are only allowed inside a quoted string, as described in the
paragraph below, and in addition, a backslash or double-quote must
be preceded by a backslash); comments are allowed with parentheses
at either end of the local-part; e.g.
john.smith(comment)@example.com
and
(comment)[email protected]
are both equivalent to
[email protected]
.Domain
A to Z
and a to z
;0 to 9
, provided that top-level domain names are not
all-numeric;-
, provided that it is not the first or last character.
Comments are allowed in the domain as well as in the local-part; for
example, john.smith@(comment)example.com
and
[email protected](comment)
are equivalent to
[email protected]
.Upvotes: 1
Views: 6215
Reputation:
You could RFC5322 like this
( reference regex modified )
"(?im)^(?=.{1,64}@)(?:(\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"@)|((?:[0-9a-z](?:\\.(?!\\.)|[-!#\\$%&'\\*\\+/=\\?\\^`\\{\\}\\|~\\w])*)?[0-9a-z]@))(?=.{1,255}$)(?:(\\[(?:\\d{1,3}\\.){3}\\d{1,3}\\])|((?:(?=.{1,63}\\.)[0-9a-z][-\\w]*[0-9a-z]*\\.)+[a-z0-9][\\-a-z0-9]{0,22}[a-z0-9])|((?=.{1,63}$)[0-9a-z][-\\w]*))$"
https://regex101.com/r/ObS3QZ/1
# (?im)^(?=.{1,64}@)(?:("[^"\\]*(?:\\.[^"\\]*)*"@)|((?:[0-9a-z](?:\.(?!\.)|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)?[0-9a-z]@))(?=.{1,255}$)(?:(\[(?:\d{1,3}\.){3}\d{1,3}\])|((?:(?=.{1,63}\.)[0-9a-z][-\w]*[0-9a-z]*\.)+[a-z0-9][\-a-z0-9]{0,22}[a-z0-9])|((?=.{1,63}$)[0-9a-z][-\w]*))$
# Note - remove all comments '(comments)' before runninig this regex
# Find \([^)]*\) replace with nothing
(?im) # Case insensitive
^ # BOS
# Local part
(?= .{1,64} @ ) # 64 max chars
(?:
( # (1 start), Quoted
" [^"\\]*
(?: \\ . [^"\\]* )*
"
@
) # (1 end)
| # or,
( # (2 start), Non-quoted
(?:
[0-9a-z]
(?:
\.
(?! \. )
| # or,
[-!#\$%&'\*\+/=\?\^`\{\}\|~\w]
)*
)?
[0-9a-z]
@
) # (2 end)
)
# Domain part
(?= .{1,255} $ ) # 255 max chars
(?:
( # (3 start), IP
\[
(?: \d{1,3} \. ){3}
\d{1,3} \]
) # (3 end)
| # or,
( # (4 start), Others
(?: # Labels (63 max chars each)
(?= .{1,63} \. )
[0-9a-z] [-\w]* [0-9a-z]*
\.
)+
[a-z0-9] [\-a-z0-9]{0,22} [a-z0-9]
) # (4 end)
| # or,
( # (5 start), Localdomain
(?= .{1,63} $ )
[0-9a-z] [-\w]*
) # (5 end)
)
$ # EOS
How make [email protected] this as valid email ID – Mihir Feb 7 at 9:34
I think the spec wants the local part to be either encased in quotes
or, to be encased by [0-9a-z]
.
But, to get around the later and make [email protected]
valid, just
replace group 2 with this:
( # (2 start), Non-quoted
[0-9a-z]
(?:
\.
(?! \. )
| # or,
[-!#\$%&'\*\+/=\?\^`\{\}\|~\w]
)*
@
) # (2 end)
New regex
"(?im)^(?=.{1,64}@)(?:(\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"@)|([0-9a-z](?:\\.(?!\\.)|[-!#\\$%&'\\*\\+/=\\?\\^`\\{\\}\\|~\\w])*@))(?=.{1,255}$)(?:(\\[(?:\\d{1,3}\\.){3}\\d{1,3}\\])|((?:(?=.{1,63}\\.)[0-9a-z][-\\w]*[0-9a-z]*\\.)+[a-z0-9][\\-a-z0-9]{0,22}[a-z0-9])|((?=.{1,63}$)[0-9a-z][-\\w]*))$"
New demo
https://regex101.com/r/ObS3QZ/5
Upvotes: 3
Reputation: 5095
A regular expression is the most difficult and error-prone way to validate emails addresses. If you are using an implementation of javax.mail
to send the emails, then the simplest way to determine if it will work is by using the provided parser, because whether the email is compliant or not, if the library cannot use it, then it doesn't matter.
public static boolean validateEmail(String address) {
try {
// if this fails, the mail library can't send emails to this address
InternetAddress ia = new InternetAddress(address, true);
return ia.isGroup() && ia.getAddress().charAt(0) != '@';
}
catch (Throwable t) {
return false;
}
}
Invoking it with false
allows emails without a @domain
part when strict parsing. And since the checkAddress
function invoked internally is private and we can't just call checkAddress(addr,false,true)
since we don't want routing information (a feature practically designed for fraud through server bouncing), we have to check the first letter of the validated address.
Now what you may notice here is that this validation method is actually compliant to RFC 2822, rather than 5822. The reason for this is because unless you are implementing your own SMTP sender library, then you're using one that depends on this one, and if you have an address that is 5822-valid but 2822-invalid, then your 5822-validation is rendered useless. But if you are implementing your own 5822 SMTP library, then you should learn from the existing ones and write a parser function, rather than a regular expression.
Upvotes: 3
Reputation: 1139
It's not the question you asked, but why re-invent the wheel?
Apache commons has a class that covers this already.
org.apache.commons.validator.routines.EmailValidator.getInstance().isValid(email)
This way you aren't responsible for keeping up to date with changing email format standards.
Upvotes: 4