Removing html tags and content where tag content matches an array of values using Xml.parse()

Question

I've extracted some html from GmailApp using .getBody() and would like to return some html which filters a specific tag and contents where the contents matches any value in an array (specifically links with certain text). Looking at this solution I figure the easiest way to do this would be to use Xml.parse() and filter the object but can't get beyond creating the XmlDocument.

For example, if:

var html = 'some text then foo
 and then some baa,and finally close';

and

var linksToRemove = ['baa','foo'];

how could I return

var newHtml = 'some text then 
 and then some ,and finally close';

using

var obj = Xml.parse(html, true);

I can get an object to process but it all falls apart from there (I did also consider just using .replace() but given the issues with matching with RegEx thought it best to avoid)

mhawksey · Accepted Answer

Following suggestion opted to try using regex

var html = 'some text then foo
 and then some baa,and finally close';

var linksToRemove = ['baa', 'foo'];
var newHtml = cleanBody(html, linksToRemove);

/**
 * Removes links from html text
 * @param {string} html The html to be cleaned.
 * @param {array} exclude The array of link text to remove.
 * @returns {string} Cleaned html.
 */
function cleanBody(html, exclude) {
    html = html.replace(/
?
|
|	/g, ''); // used to remove breaks and tabs
    var re = ']*>(' + exclude.join('|') + ')<\/a>';
    return html.replace(new RegExp(re, 'ig'), "");
}

Test at http://jsfiddle.net/HdsPU/

Removing html tags and content where tag content matches an array of values using Xml.parse()

Answers (1)

Related Questions