Reputation: 31
I want to be able to extract an email address embedded in tags e.g. <email> [email protected] </email>
where the src is as <email>[email protected]</email>
My expression I use is as follows: (?<=email>).*(?=<)/i)
.
This works well. However, if the email is a hyperlink i.e. <email>**<a href="mailto:[email protected]" target="_blank"**>[email protected]</a> </email>
then i can no longer extract the extact email address. i get the following:
<a href="mailto:[email protected]">[email protected]</a>
instead of [email protected]
. I have tried (?<=a href="mailto:).*(?="target="_blank")/i)
but nothing is returned.
Any ideas on how to extract the email when the hyperlink is there?
Upvotes: 0
Views: 306
Reputation: 18473
Web dev 101: don't parse HTML with regex, use DOM manipulations instead.
This below logs all the emails, whether they are inside plain email
tags or a inside email
tags or any nesting of tags.
console.log(
Array.from(document.getElementsByTagName('email'))
.map(elt => elt.textContent)
.map(email => email.trim())
)
<email>[email protected]</email>
<email><a href="mailto:[email protected]">[email protected]</a></email>
<email><b><a href="mailto:[email protected]">[email protected]</a></b></email>
<email><span><b><a href="mailto:[email protected]">[email protected]</a></b></span></email>
<email>"o'brian"@irish.com</email>
The .trim()
is useful in case there is whitespace in the HTML which can show up around the email.
Upvotes: 1
Reputation: 1618
You can parse each line of Dom and match email regex with tag content, like below snippet :
<script>
function getEmailsFromText (text)
{
return text.match(/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/gi);
}
var items = document.getElementsByTagName("*");
for (var i = 0; i < items.length; i++) {
var text = items.item(i).textContent;
var emailIds = getEmailsFromText(text);
if(emailIds){
console.log("Emails ID's : "+emailIds);
}
}
</script>
To test, open your javascript console tab and paste the above code which inside script tag and you can see all email id's of current html page.
Upvotes: 0