Reputation: 2053
I've extracted some html from GmailApp using .getBody() and would like to return some html which filters a specific tag and contents where the contents matches any value in an array (specifically links with certain text). Looking at this solution I figure the easiest way to do this would be to use Xml.parse()
and filter the object but can't get beyond creating the XmlDocument.
For example, if:
var html = '<div>some text then <div><a href="http://example1.com">foo</a></div> and then <span>some <a href="http://example2.com">baa</a>,and finally <a href="http://example3.com">close</a></span></div>';
and
var linksToRemove = ['baa','foo'];
how could I return
var newHtml = '<div>some text then <div></div> and then <span>some ,and finally <a href="http://example3.com">close</a></span></div>';
using
var obj = Xml.parse(html, true);
I can get an object to process but it all falls apart from there (I did also consider just using .replace()
but given the issues with matching with RegEx thought it best to avoid)
Upvotes: 1
Views: 1251
Reputation: 2053
Following suggestion opted to try using regex
var html = '<div>some text then <div><a href="http://example1.com">foo</a></div> and then <span>some <a href="http://example2.com">baa</a>,and finally <a href="http://example3.com">close</a></span></div>';
var linksToRemove = ['baa', 'foo'];
var newHtml = cleanBody(html, linksToRemove);
/**
* Removes links from html text
* @param {string} html The html to be cleaned.
* @param {array} exclude The array of link text to remove.
* @returns {string} Cleaned html.
*/
function cleanBody(html, exclude) {
html = html.replace(/\r?\n|\r|\t/g, ''); // used to remove breaks and tabs
var re = '<a\\b[^>]*>(' + exclude.join('|') + ')<\\/a>';
return html.replace(new RegExp(re, 'ig'), "");
}
Test at http://jsfiddle.net/HdsPU/
Upvotes: 1