user734094
user734094

Reputation:

Extract substring enclosed by HTML tags Coldfusion

I have maybe a silly question. Let's say that we have a string:

"my name <em>is</em> Tom <em>Papas</em> and I am 30 <em>years</em> of age<em>!</em>"

The question is: how do we extract the substrings that are enclosed within the <em> tags and output them as a list, array or a comma delimited string using coldfusion? Notice that we don't know what substrings are enclosed within the tags. We need to extract substrings blindly.

Thank you in advance,

Tom

Greece

Upvotes: 0

Views: 1508

Answers (2)

Russ
Russ

Reputation: 1951

In CF 10 or Railo 4, you can combine xmlParse() with Underscore.cfc's map() function, like so:

str = "my name <em>is</em> Tom <em>Papas</em> and I am 30 <em>years</em> of age<em>!</em>";
str = "<myWrapper>" & str & "</myWrapper>";
xmlObj = XmlParse(str);
resultAsArray = _.map(xmlObj.myWrapper.xmlChildren, function (val) {
    return val.xmlText;
});

(Disclaimer: I wrote Underscore.cfc)

Upvotes: 0

Henry
Henry

Reputation: 32905

Download jsoup and put the jar in your CF's lib folder

html = "my name <em>is</em> Tom <em>Papas</em> and I am 30 <em>years</em> of age<em>!</em>";

dom = createObject("java", "org.jsoup.Jsoup").parse(html);
emElements = dom.getElementsByTag("em");

results = [];
for (em in emElements)
    arrayAppend(results, em.text());

For more info: http://www.bennadel.com/blog/2358-Parsing-Traversing-And-Mutating-HTML-With-ColdFusion-And-jSoup.htm

Or use basic Regex

matches = rematch("<em>[^<]*</em>", html);
results = [];
for (match in matches)
    arrayAppend(results, rereplace(match, "<em>(.*)</em>", "\1") ); 

Upvotes: 1

Related Questions