Paul
Paul

Reputation: 31

Regex to extract email address

I want to be able to extract an email address embedded in tags e.g. <email> [email protected] </email> where the src is as &lt;email&gt;[email protected]&lt;/email&gt;

My expression I use is as follows: (?<=email&gt;).*(?=&lt;)/i). This works well. However, if the email is a hyperlink i.e. &lt;email&gt;**<a href="mailto:[email protected]" target="_blank"**>[email protected]</a> &lt;/email&gt; then i can no longer extract the extact email address. i get the following: <a href="mailto:[email protected]">[email protected]</a> instead of [email protected]. I have tried (?<=a href="mailto:).*(?="target="_blank")/i) but nothing is returned. Any ideas on how to extract the email when the hyperlink is there?

Upvotes: 0

Views: 306

Answers (2)

Nino Filiu
Nino Filiu

Reputation: 18473

Web dev 101: don't parse HTML with regex, use DOM manipulations instead.

This below logs all the emails, whether they are inside plain email tags or a inside email tags or any nesting of tags.

console.log(
  Array.from(document.getElementsByTagName('email'))
  .map(elt => elt.textContent)
  .map(email => email.trim())
)
<email>[email protected]</email>
<email><a href="mailto:[email protected]">[email protected]</a></email>
<email><b><a href="mailto:[email protected]">[email protected]</a></b></email>
<email><span><b><a href="mailto:[email protected]">[email protected]</a></b></span></email>
<email>"o'brian"@irish.com</email>

The .trim() is useful in case there is whitespace in the HTML which can show up around the email.

Upvotes: 1

Abhishek
Abhishek

Reputation: 1618

You can parse each line of Dom and match email regex with tag content, like below snippet :

<script>
function getEmailsFromText (text)
{
    return text.match(/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/gi);
}
var items = document.getElementsByTagName("*");
    for (var i = 0; i < items.length; i++) {
        var text = items.item(i).textContent;
        var emailIds = getEmailsFromText(text);
        if(emailIds){
        console.log("Emails ID's : "+emailIds);
        }
    }
</script>

To test, open your javascript console tab and paste the above code which inside script tag and you can see all email id's of current html page.

Upvotes: 0

Related Questions