sss71
sss71

Reputation: 23

Regex poor performance when nothing matches

I have a problem with slow working regex, but only in case when the patter doesn't match. In all other cases performance are acceptable, even if patter matches in the end of text. I'am testing performance on 100KB text input.

What I am trying to do is to convert input in HTML-like syntax which is using [] instead of <> brackets and translate it to valid XML.

Sample input:

...some content[vc_row param="test1"][vc_column]text [brackets in text] content[/vc_column][/vc_row][vc_row param="xxx"]text content[/vc_row]...some more content

Sample output:

...some content<div class="vc_row" param="test1"><div class="vc_column" >text [brackets in text] content</div></div><div class="vc_row" param="xxx">text content</div>...some more content

To do this I am using regex:

/(.*)(\[\/?vc_column|\[\/?vc_row)( ?)(.*?)(\])(.*)/

And I do this in while loop until the patter matches.

As I mentioned before this works, but last iteration is extremly slow (or first if nothing matches). Here is complete javascript I am using:

var str   = '...some content[vc_row param="test1"][vc_column]text content[/vc_column][/vc_row][vc_row param="xxx"]text content[/vc_row]...some more content';

var regex = /(.*)(\[\/?vc_column|\[\/?vc_row)( ?)(.*?)(\])(.*)/;
while (matches = str.match(regex)) {
    matches = str.match(regex);
    if (matches[2].slice(1, 2) !== '/')
        str = matches[1] + "<div class=\"" + matches[2].slice(1) + "\"" + " " + matches[4] + ">" + matches[6];
    else
        str = matches[1] + "</div>" + matches[6];
}

How could i improve my regex "not match" performance?

Upvotes: 2

Views: 155

Answers (2)

LukStorms
LukStorms

Reputation: 29677

You can split it up in 2 regex. One for the start tags, one for the closing tags.

And then chain 2 global g replaces.

var str   = '...some content[vc_row param="test1"][vc_column]text with [brackets in text] content[/vc_column][/vc_row][vc_row param="xxx"]text content[/vc_row]...some more content';

const reg1 = /\[(vc_(?:column|row))(\s+[^\]]+)?\s*\]/g;
const reg2 = /\[\/(vc_(?:column|row))\s*\]/g;

var result = str.replace(reg1, "<div class=\"$1\"$2>").replace(reg2, "</div>");

console.log(result);

Note that those (.*) in the original regex aren't needed this way.

Using a nameless function, then it could be done via 1 regex replace.

var str   = '...some content[vc_row param="test1"][vc_column]text with [brackets in text] content[/vc_column][/vc_row][vc_row param="xxx"]text content[/vc_row]...some more content';

const reg = /\[(\/)?(vc_(?:column|row))(\s+[^\]]+)?\s*\]/g;

var result = str.replace(reg, function(m,c1,c2,c3){
              if(c1) return "</div>";
              else return "<div class=\""+ c2 +"\""+ (c3?c3:"") +">";
             });

console.log(result);

Upvotes: 1

SamWhan
SamWhan

Reputation: 8332

How about a replace... Like

str.replace(/\[(\/?)(vc_column|vc_row)([^\]]*?)\]/g, function(a,b,c,d) {
    return '<' + b + 'div' + (b==='/' ? '' : ' class="' + c + '"') + d + '>';
    });

This matches a tag (start or end) and all attributes, including brackets, capturing everything except the brackets. Then puts it back together in the correct format (divs with classes).

And the global flag (/../g) removes the need for any loops.

var sInput = '...some content[vc_row param="test1"][vc_column]text [brackets in text] content[/vc_column][/vc_row][vc_row param="xxx"]text content[/vc_row]...some more content';

console.log(sInput.replace(/\[(\/?)(vc_column|vc_row)([^\]]*?)\]/g, function(a,b,c,d) {
    return '<' + b + 'div' + (b==='/' ? '' : ' class="' + c + '"') + d + '>';
    })
    );

Upvotes: 1

Related Questions