Reputation: 23
I have a problem with slow working regex, but only in case when the patter doesn't match. In all other cases performance are acceptable, even if patter matches in the end of text. I'am testing performance on 100KB text input.
What I am trying to do is to convert input in HTML-like syntax which is using [] instead of <> brackets and translate it to valid XML.
Sample input:
...some content[vc_row param="test1"][vc_column]text [brackets in text] content[/vc_column][/vc_row][vc_row param="xxx"]text content[/vc_row]...some more content
Sample output:
...some content<div class="vc_row" param="test1"><div class="vc_column" >text [brackets in text] content</div></div><div class="vc_row" param="xxx">text content</div>...some more content
To do this I am using regex:
/(.*)(\[\/?vc_column|\[\/?vc_row)( ?)(.*?)(\])(.*)/
And I do this in while loop until the patter matches.
As I mentioned before this works, but last iteration is extremly slow (or first if nothing matches). Here is complete javascript I am using:
var str = '...some content[vc_row param="test1"][vc_column]text content[/vc_column][/vc_row][vc_row param="xxx"]text content[/vc_row]...some more content';
var regex = /(.*)(\[\/?vc_column|\[\/?vc_row)( ?)(.*?)(\])(.*)/;
while (matches = str.match(regex)) {
matches = str.match(regex);
if (matches[2].slice(1, 2) !== '/')
str = matches[1] + "<div class=\"" + matches[2].slice(1) + "\"" + " " + matches[4] + ">" + matches[6];
else
str = matches[1] + "</div>" + matches[6];
}
How could i improve my regex "not match" performance?
Upvotes: 2
Views: 155
Reputation: 29677
You can split it up in 2 regex. One for the start tags, one for the closing tags.
And then chain 2 global g
replaces.
var str = '...some content[vc_row param="test1"][vc_column]text with [brackets in text] content[/vc_column][/vc_row][vc_row param="xxx"]text content[/vc_row]...some more content';
const reg1 = /\[(vc_(?:column|row))(\s+[^\]]+)?\s*\]/g;
const reg2 = /\[\/(vc_(?:column|row))\s*\]/g;
var result = str.replace(reg1, "<div class=\"$1\"$2>").replace(reg2, "</div>");
console.log(result);
Note that those (.*)
in the original regex aren't needed this way.
Using a nameless function, then it could be done via 1 regex replace.
var str = '...some content[vc_row param="test1"][vc_column]text with [brackets in text] content[/vc_column][/vc_row][vc_row param="xxx"]text content[/vc_row]...some more content';
const reg = /\[(\/)?(vc_(?:column|row))(\s+[^\]]+)?\s*\]/g;
var result = str.replace(reg, function(m,c1,c2,c3){
if(c1) return "</div>";
else return "<div class=\""+ c2 +"\""+ (c3?c3:"") +">";
});
console.log(result);
Upvotes: 1
Reputation: 8332
How about a replace... Like
str.replace(/\[(\/?)(vc_column|vc_row)([^\]]*?)\]/g, function(a,b,c,d) {
return '<' + b + 'div' + (b==='/' ? '' : ' class="' + c + '"') + d + '>';
});
This matches a tag (start or end) and all attributes, including brackets, capturing everything except the brackets. Then puts it back together in the correct format (div
s with class
es).
And the global flag (/../g
) removes the need for any loops.
var sInput = '...some content[vc_row param="test1"][vc_column]text [brackets in text] content[/vc_column][/vc_row][vc_row param="xxx"]text content[/vc_row]...some more content';
console.log(sInput.replace(/\[(\/?)(vc_column|vc_row)([^\]]*?)\]/g, function(a,b,c,d) {
return '<' + b + 'div' + (b==='/' ? '' : ' class="' + c + '"') + d + '>';
})
);
Upvotes: 1