Reputation: 405
I'm trying to use the following regular expression to find all e-mails in an html string:
RegExp
[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}
HTML
<a href="mailto:[email protected]">[email protected]</a></span>. </p>
I'm using matcher.find() which is supposed to find substrings is it not? When I perform the search it is coming up empty, any ideas why?
Upvotes: 1
Views: 78
Reputation: 6223
This way of searching for emails is no longer correct when we have new domains. This regular expression would not find any email in domain site.berlin. Extend 2,4, delete or look for
[A-Za-z0-9-+/.]*@[A-Za-z0-9/.-]*\\.*[A-Za-z]$
I don't have enough reputation to comment a post, afair the longest TLD domain is .international so {2,4} won't find it and remember about domains with dot inside root name like .co.uk, .de.com. Domain must also end with a letter, it cannot be number or special character. Email address might contain delimiter like + or -
Upvotes: 0
Reputation: 124215
Regex is case sensitive by default so for instance last part .net
can't be matched with .[A-Z]{2,4}
.
To make your regex case insensitive add (?i)
flag
"(?i)[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}"
or compile it with Pattern.CASE_INSENSITIVE
flag.
Pattern.compile("[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}",Pattern.CASE_INSENSITIVE);
Upvotes: 3
Reputation: 37520
A-Z
will only match upper case, and there is an extra \
. Try this...
[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[a-zA-Z]{2,4}
Upvotes: 2