Gandalf StormCrow
Gandalf StormCrow

Reputation: 26212

Problems with matching emails with regex

I'm trying to match an email address here is what I've come up with so far :

String text = "[email protected]"; 
String regex = "(\\w+)@{1}(\\w+){2,}\\.{1}\\w{2,4}";

This however works with following cases :

[email protected]
[email protected]
[email protected]

So it matches any alphanumeric character repeated once or more that comes before one @ followed by any alphanumeric character repeated at least two times(which is minimal characters for any domain name) followed by one .(dot) and followed by any alphanumeric character repeated at least 2 times and at most 4 times(because there are domains such as .us or .mobi).

This expression however does not work with emails such as :

[email protected] [email protected] [email protected] [email protected] etc as many subdomains

or

[email protected] [email protected] [email protected] [email protected]

I've just started to learn regex and I found interesting to try to solve problems such as these by using regex .. not partially but for each case, any help would be much appriciated. Thank you

Upvotes: 1

Views: 207

Answers (3)

Sam Holder
Sam Holder

Reputation: 32946

To answer your question, as you are learning.

The problem with your regex not matching with the first lot is partly because the part before the @ does not allow the '.' character. changing to this:

 String regex = "([\\w.]+)@(\\w+){2,}\\.\\w{2,4}";

should allow [email protected], because the [\\w.]+ says any character in the group '\w' (any character) or '.' (does not need to be escaped when part of a group, actually means a dot) 1 or more times

This might give you enough of a help to be able to figure the rest out on your own. after all that is the point of learning :)

I tested this at http://www.regexplanet.com/simple/index.html which uses the java library for the engine.

Upvotes: 0

Thibault Falise
Thibault Falise

Reputation: 5885

The regex you use is very restrictive :

  • Using the \w character class before the @ does not allow the . character, which explains why gandalf.storm does not match
  • In the domain part of the regex, you only allow two "words" separated with a . character, which excludes "mysubdomain.mydomain.net"

You should try to fix these to match your more complicated examples.

As a side note, when you want to match a single character, the {1} part is not mandatory.

Upvotes: 0

Frank Shearar
Frank Shearar

Reputation: 17132

This question has been asked many, many times before here on SO. Here's why you don't want to use regexes to parse email addresses. Note please that that monster of a regex doesn't even handle comments.

Upvotes: 2

Related Questions