Hanging .NET regex with high CPU

Question

Here's a weird .NET regex problem that I can't figure out. I'm trying to unparse some HTML in my forum app. I haven't changed the code, but in certain environments, the regex simply never returns. I can reproduce it in the app:

line 66: https://github.com/POPWorldMedia/POPForums/blob/master/PopForums/Services/TextParsingService.cs

text = Regex.Replace(text, @"()", "http://www.youtube.com/watch?v=$4", RegexOptions.IgnoreCase);
</code></pre>

<p>The input string it's choking on is:</p>

<pre><code><p>This is an <strong>important</strong> <em>preview</em> of a post.</p>[quote]<p>This is a quote.<br /></p>[/quote]<p><iframe width="640" height="360" src="http://www.youtube.com/embed/Zey3WWThErw" frameborder="0" allowfullscreen>
O look! YouTube!

It will eventually time out here: http://regexlib.com/RETester.aspx

The host process, IIS in this case, goes to about 50% locally (one core, I assume) and never lets go or returns. I'm completely stumped. The same code is running on one of my sites in Azure and it doesn't choke there.

dee-see · Accepted Answer

The (\S+ )* and ( *\S+)* parts cause a lot of backtracking.

Consider replacing them simply by .*. It is not 100% equivalent, but I think it should work with what I feel like you are trying to do.

text = Regex.Replace(text, @"()", "http://www.youtube.com/watch?v=$4", RegexOptions.IgnoreCase);
</code></pre>

<p>You'll have other problems with that regex since it maches greedily. You might want to try this instead to make sure you don't have any problems if there are several <code>iframe</code> tags within your text.</p>

<pre><code>text = Regex.Replace(text, @"(<iframe )(.)*?(src=""http://www.youtube.com/embed/)(\S+)("")(.)*?( */iframe>)", "http://www.youtube.com/watch?v=$4", RegexOptions.IgnoreCase);
</code></pre>

<p>As always, you should also consider using an HTML parser instead of regex for this kind of task.</p>

Hanging .NET regex with high CPU

Answers (2)

Related Questions