cantread
cantread

Reputation: 405

Why does this Regular Expression not match anything?

I'm trying to use the following regular expression to find all e-mails in an html string:

RegExp
[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}

HTML
<a href="mailto:[email protected]">[email protected]</a></span>. </p> 

I'm using matcher.find() which is supposed to find substrings is it not? When I perform the search it is coming up empty, any ideas why?

Upvotes: 1

Views: 78

Answers (3)

agilob
agilob

Reputation: 6223

This way of searching for emails is no longer correct when we have new domains. This regular expression would not find any email in domain site.berlin. Extend 2,4, delete or look for

[A-Za-z0-9-+/.]*@[A-Za-z0-9/.-]*\\.*[A-Za-z]$

I don't have enough reputation to comment a post, afair the longest TLD domain is .international so {2,4} won't find it and remember about domains with dot inside root name like .co.uk, .de.com. Domain must also end with a letter, it cannot be number or special character. Email address might contain delimiter like + or -

Upvotes: 0

Pshemo
Pshemo

Reputation: 124215

Regex is case sensitive by default so for instance last part .net can't be matched with .[A-Z]{2,4}.

To make your regex case insensitive add (?i) flag

"(?i)[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}"

or compile it with Pattern.CASE_INSENSITIVE flag.

Pattern.compile("[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}",Pattern.CASE_INSENSITIVE);

Upvotes: 3

Anthony Chu
Anthony Chu

Reputation: 37520

A-Z will only match upper case, and there is an extra \. Try this...

[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[a-zA-Z]{2,4}

Upvotes: 2

Related Questions