Kawd
Kawd

Reputation: 4450

Regex match not working when string is too long?

I have the following string :

var example = '{%start%}$MOXDATA${"name":"group one","sections":[{"name":"section one","fields":[{"name":"plain one","type":"plain","value":"// some \"plain\" \'\"\"\"\'\' \'\'  \'    \'  \" \" \" tesP(&^&#37;I&63657riu43r3+_)(I)p;l&gt;:\"&gt;&lt;/&#125;&#125;&#123;|\":1~~``"},{"name":"rich one","type":"rich","value":"<ul>\n<li><span style=\"font-size: 11px;\">{ Lo<span style=\"font-family: \'comic sans ms\', sans-serif;\">rem</span> ipsu<span style=\"color: #ffff00; background-color: #339966;\">m dolor si</span>t amet, consec<strong>tetur adi</strong>piscing elit. Vestibulum ac dolor pulvinar ipsum luctus ullamcorper.</span></li>\n<li></li>\n<li><a href=\"http://retrgfd.com/resrgf\">erwfd\"etrgfdd\'\'refre\"\'\"refrds\'\"\"\"sdgfd</a></li>\n</ul>"},{"name":"repeater one","type":"repeater","value":[[{"name":"plain one","type":"plain","value":"some test value"},{"name":"rich one","type":"rich","value":"some test value"},{"name":"link one","type":"link","value":"some test value"},{"name":"media one","type":"media","value":"some test value"},{"name":"link two","type":"link","value":"some test value"}]]}]},{"name":"section two","fields":[{"name":"link one","type":"link","value":"<a href=\"http://www.yyyy.com\">take me to your leader</a>"}]}]}$MOXDATA${%end%}';

And I'm doing example.match(/{%start%}\$MOXDATA\$(.+)\$MOXDATA\${%end%}/); which is returning null.

However, if I use a significantly shorter version of the above string, as in :

var shorter = '{%start%}$MOXDATA${"name""}]}]}$MOXDATA${%end%}';
shorter.match(/{%start%}\$MOXDATA\$(.+)\$MOXDATA\${%end%}/);

{"name""}]}]} is then correctly matched.

Why is that? What am I doing wrong?

Upvotes: 1

Views: 2295

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89547

Anony-Mousse answer is good and stribizhev comment too.

However, when you have to deal with a long string, you should use something that causes less backtracking ([^]* or [\s\S]* will match all characters with newlines until the end of the string and the regex engine must go back character by character until it find $MOXDATA${%end%}. That's a lot of work.)

To avoid this work, you can replace [^]* or [\s\S]* with:
[^$]*(?:\$+(?!MOXDATA\${%end%})[^$]*)*

or more robust (if $MOXDATA${%end%} doesn't exist):
(?=([^$]*))\1(?=((?:\$+(?!MOXDATA\${%end%})[^$]*)*))\2

((?=(subpattern))\1 emulates an atomic group.)

In this way the subpattern MOXDATA\${%end%} is only tested on each $.

Upvotes: 4

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77454

By default, .* will not match newlines.

Try [^]* to match really any character.

Upvotes: 3

Related Questions