EugeneP
EugeneP

Reputation: 12003

java email extraction regular expression?

I would like a regular expression that will extract email addresses from a String (using Java regular expressions).

That really works.

Upvotes: 8

Views: 21459

Answers (5)

Duy Pham
Duy Pham

Reputation: 1289

The Java 's build-in email address pattern (Patterns.EMAIL_ADDRESS) works perfectly:

    public static List<String> getEmails(@NonNull String input) {
        List<String> emails = new ArrayList<>();
        Matcher matcher = Patterns.EMAIL_ADDRESS.matcher(input);
        while (matcher.find()) {
            int matchStart = matcher.start(0);
            int matchEnd = matcher.end(0);
            emails.add(input.substring(matchStart, matchEnd));
        }
        return emails;
    }

Upvotes: 0

Digital Human
Digital Human

Reputation: 1637

a little late but ok.

Here is what i use. Just paste it in the console of FireBug and run it. Look on the webpage for a 'Textarea' (Most likely on the bottom of the page) That will contain a , seperated list of all email address found in A tags.

    var jquery = document.createElement('script');
    jquery.setAttribute('src', 'http://code.jquery.com/jquery-1.10.1.min.js');
    document.body.appendChild(jquery);

    var list = document.createElement('textarea');
    list.setAttribute('emaillist');
    document.body.appendChild(list);
var lijst = "";

    $("#emaillist").val("");
    $("a").each(function(idx,el){
        var mail = $(el).filter('[href*="@"]').attr("href");
        if(mail){
            lijst += mail.replace("mailto:", "")+",";
        }
    });
    $("#emaillist").val(lijst);

Upvotes: 1

thealy
thealy

Reputation: 51

I had to add some dashes to allow for them. So a final result in Javanese:

final String MAIL_REGEX = "([_A-Za-z0-9-]+)(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})";

Upvotes: 5

Blessed Geek
Blessed Geek

Reputation: 21684

Install this regex tester plugin into eclipse, and you'd have whale of a time testing regex
http://brosinski.com/regex/.

Points to note:
In the plugin, use only one backslash for character escape. But when you transcribe the regex into a Java/C# string you would have to double them as you would be performing two escapes, first escaping the backslash from Java/C# string mechanism, and then second for the actual regex character escape mechanism.

Surround the sections of the regex whose text you wish to capture with round brackets/ellipses. Then, you could use the group functions in Java or C# regex to find out the values of those sections.

([_A-Za-z0-9-]+)(\.[_A-Za-z0-9-]+)@([A-Za-z0-9]+)(\.[A-Za-z0-9]+)

For example, using the above regex, the following string

[email protected]

yields

start=0, end=16
Group(0) = [email protected]
Group(1) = abc
Group(2) = .efg
Group(3) = asdf
Group(4) = .cde

Group 0 is always the capture of whole string matched.

If you do not enclose any section with ellipses, you would only be able to detect a match but not be able to capture the text.

It might be less confusing to create a few regex than one long catch-all regex, since you could programmatically test one by one, and then decide which regexes should be consolidated. Especially when you find a new email pattern that you had never considered before.

Upvotes: 3

EugeneP
EugeneP

Reputation: 12003

Here's the regular expression that really works. I've spent an hour surfing on the web and testing different approaches, and most of them didn't work although Google top-ranked those pages.

I want to share with you a working regular expression:

[_A-Za-z0-9-]+(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})

Here's the original link: http://www.mkyong.com/regular-expressions/how-to-validate-email-address-with-regular-expression/

Upvotes: 15

Related Questions