finding text between tags with RegEx for Coldfusion including linebreaks

Question

I am trying to extract javascript code from HTML content that I receive via CFHTTP request.

I have this simple regex that catches everyting as long as there is no linebreak in the code between the tags.

var result=REMatch("]*>(.*?)",html);

This will catch:

I have tried to use (?m) for multiline, but it doesn't work like that. I am using the reference to figure it out but I am just not getting it with regex.

Heads up, normally there would be javascript between the script tags, not simple text so also characters like {}();:-_ etc.

Can anyone help me out?

Cheers

[[UPDATE]] Thanks guys, I will try the solutions. I favor regex because but I will look into the HTML Parser too.

Peter Boughton · Accepted Answer

(?m) multiline mode is for making ^ and $ match on line breaks (not just start/end of string as is default), but what you're trying to do here is make . include newlines - for that you want (?s) (dot-all mode).

However, I probably wouldn't do this with regex - a HTML parser is a more robust solution. Here's how to do it with jSoup:

var result = jsoup.parse(html).select('script').text();

More details on using jSoup in CF are available here, or alternatively you can use the TagSoup parser, which ships with CF10 (so you don't need to worry about jars/etc).

If you really want regex, then you can use this:

var result = rematch(']*>(?:[^<]+|<(?!/script>))+',html);

Unlike using (?s).*? this avoids matching empty blocks (but it will still fail in certain edge cases - if accuracy is required use a HTML parser).

To extract just the text from the first script block, you can strip the script tag with this:

result = ListRest( result[1] , '>' );

finding text between <script></script> tags with RegEx for Coldfusion including linebreaks

Answers (2)

Related Questions

finding text between &lt;script&gt;&lt;/script&gt; tags with RegEx for Coldfusion including linebreaks

Answers (2)

Related Questions

finding text between <script></script> tags with RegEx for Coldfusion including linebreaks