Cédric
Cédric

Reputation: 411

Regex - match any text between some delimiters

I try to catch this string [[....]] (including brackets)

where .... can be anything (including non-printable) except ]]

Here is the source where to match :

var myString = 'blablablabla[["<strong>LA DEFENSE 4 TEMPS ( La Rotonde )</strong><br />Centre commercial LES 4 TEMPS",
                         48.89141725,
                         2.23478235,
                         "4T"],
    ["<strong>ANGERS</strong><br />Centre commercial GEANT",
                         48.89141725,
                         2.23478235,
                         "4T"]]blablablabla'

I try to use this method [^\]]+ to match all chars/non-chars except double bracket. The problem i have is that i do not know how to use this method with a bracket that is immediatly after the first bracket [^\]\]]+.

Is there a solution with positive/negative lookahead or word boundary ?

(\[\[[^\](?=\])]+)

Regular expression visualization

Debuggex Demo

Any help please ?

Upvotes: 5

Views: 3234

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626728

In JavaScript, to match any text between some delimiters that consist of more than one character is best achieved with the [^]/[\s\S]/[\d\D]/[\w\W] construct with a lazy quantifier (*? matching 0 or more occurrences, or +? matching 1 or more occurrences of the preceding subpattern, but as few as possible to return a valid match).

While [^] construct matching any character including a newline is JavaScript specific, [\s\S] and its variants are mostly cross-platform constructs that will work in PCRE, .NET, Python, Java, etc. The [...] in this case is a character class that contains two opposite shorthand classes. Since \s matches all whitespace characters and \S matches all non-whitespace characters, this [\s\S] matches any symbol there is in any input.

I'd recommend to avoid using (.|\n). This construct causes more backtracking steps to occur and slows regex search down.

So, you can use

\[\[[\d\D]*?]]

See JS regex demo

Here is a code snippet:

var re = /\[\[[\d\D]*?]]/g; 
var str = 'blablablabla[["<strong>LA DEFENSE 4 TEMPS ( La Rotonde )</strong><br />Centre commercial LES 4 TEMPS",\n                         48.89141725,\n                         2.23478235,\n                         "4T"],\n    ["<strong>ANGERS</strong><br />Centre commercial GEANT",\n                         48.89141725,\n                         2.23478235,\n                         "4T"]]blablablabla';
var m;
 
while ((m = re.exec(str)) !== null) {
    console.log(m[0]);
}

UPDATE

In this case, when the delimiters are different and consist of just 2 characters, you can use a technique of matching all characters other than the first symbol of the closing delimiter and then 0 or more sequences of the whole closing delimiter followed by 1 or more occurrences of any symbol other than the first symbol in the closing delimiter.

\[\[[^\]]*(?:][^\]]+)*]]

See regex demo

The linear character of this regex makes it really fast.

P.S. I also want to note that you do not need to escape the ] outside of character class in JS regex, but it must be escaped inside a character class - always.

Upvotes: 2

lintmouse
lintmouse

Reputation: 5119

Try this:

\[\[(.|\n)*?\]\]

https://regex101.com/r/gR5oJ3/1

It should match anything between and including [[ ]]. The main issue was dealing with newlines, and the (.|\n) part will match anything including newlines.

Upvotes: 1

Related Questions