user3923442
user3923442

Reputation: 41

Javascript Regular Expression: Only matching the last pattern

Context: I have some dynamically generated HTML which can have embedded javascript function calls inside. I'm trying to extract the function calls with a regular expression.

Sample HTML string:

 <dynamic html>

   <script language="javascript">
       funcA();
   </script>

 <a little more dynamic html>

   <script language="javascript">
       funcB();
   </script>

My goal is to extract the text "funcA();" and "funcB();" from the above snippet (either as a single string or an array with two elements would be fine). The regular expression I have so far is:
var regexp = /[\s\S]*<script .*>([\s\S]*)<\/script>[\s\S]*/gm;

Using html_str.replace(regexp, "$1") only returns "funcB();".

Now, this regexp works just fine when there is only ONE set of <script> tags in the HTML, but when there are multiple it only returns the LAST one when using the replace() method. Even removing the '/g' modifier matches only the last function call. I'm still a novice to regular expressions so I know I'm missing something fundamental here... Any help in pointing me in the right direction would be greatly appreciated. I've done a bit of research already but still haven't been able to get this issue resolved.

Upvotes: 0

Views: 128

Answers (1)

Julian
Julian

Reputation: 757

Your wildcard matches are all greedy. This means they will not only match what you expect, but as much as there possibly is in your code.

Make them all non-greedy (.*?) and it should work.

Upvotes: 5

Related Questions