demux
demux

Reputation: 4654

Why does this regex take so long to execute?

I created regex that's supposed to move text inide of an adjoining <span> tag.

const fix = (string) => string.replace(/([\S]+)*<span([^<]+)*>(.*?)<\/span>([\S]+)*/g, "<span$2>$1$3$4</span>")

fix('<p>Given <span class="label">Butter</span>&#39;s game, the tree counts as more than one input.</p>')
// Results in:
'<p>Given <span class="label">Butter&#39;s</span> game, the tree counts as more than one input.</p>'

But if I pass it a string where there is no text touching a <span> tag, it takes a few seconds to run.

I'm testing this on Chrome and Electron.

Upvotes: 1

Views: 77

Answers (1)

rock321987
rock321987

Reputation: 11032

([\S]+)* and ([^<]+)* are the culprits that causes catastrophic backtracking when there is no </span>. You need to modify your regex to

([\S]*)<span([^<]*)>(.*?)<\/span>([\S]*)

It will work but its still not efficient.

Why use character class for \S? The above reduces to

(\S*)<span([^<]*)>(.*?)<\/span>(\S*)

If you are concerned only about content of span, use this instead

<span([^<]*)>(.*?)<\/span>

Check here <= (See the reduction in number of steps)

NOTE : At last don't parse HTML with regex, if there are tools that can do it much more easily

Upvotes: 4

Related Questions