Joey Morani
Joey Morani

Reputation: 26591

Optimise Regex Replace in a for loop?

This is basically a follow-up of my previous question. I've been using this code to replace strings contained in an array:

string[] replacements = {"these",
                         "words",
                         "will",
                         "get",
                         "replaced"};

string newString = "Hello.replacedthesewordswillgetreplacedreplaced";

for (int j = 0; j < replacements.Length; j++)
{
    newString = Regex.Replace(newBase,
    @"((?<firstMatch>(" + replacements[j] + @"))(\k<firstMatch>)*)",
    m => "[" + j + "," + (m.Groups[3].Captures.Count + 1) + "]");
}

After running this code newString will be:

Hello.[4,1][0,1][1,1][2,1][3,1][4,2]

This works fine for small replacements like the one above. It basically replaces the strings instantly - however for large amounts of replacements it tends to slow down.

Can anyone see a way I can optimise it so it replaces faster?

I'm assuming the for loop is what's slowing it down. There are always some strings contained in the array which don't need to be replaced (because they aren't contained in the main newString string) so I wonder if there's a way to check that before the for loop. That might turn out to be slower though...

I can't think of a better way to do this so I thought I'd ask. Thanks for the help guys! :)

Upvotes: 1

Views: 411

Answers (1)

Rich O&#39;Kelly
Rich O&#39;Kelly

Reputation: 41767

A couple of methods to try (NB both untested, but I believe they should work and be quicker than your current code).

One using a static compiled Regex:

private static readonly Dictionary<string, int> Indexes = new Dictionary<string, int> 
{
  { "these", 0 },
  { "words", 1 },
  { "will", 2 },
  { "be", 3 },
  { "replaced", 4 },
};

private static readonly Regex ReplacementRegex = new Regex(string.Join("|", Indexes.Keys), RegexOptions.Compiled)

...
var occurrences = Indexes.Keys.ToDictionary(k => k, k => 0);
return ReplacementRegex.Replace(newString, m => {
  var count = occurences[m.Value];
  occurences[m.Value] = count + 1;
  return "[" + Indexes[m.Value] + "," + count + "]";
});    

And without a regex:

for (int j = 0; j < replacements.Length; j++)
{
  var index = 0;
  var count = 0;
  var replacement = replacements[j];
  while((index = newString.IndexOf(replacement, index)) > -1) 
  {
    count++;
    newString = newString.Substring(0, index) + "[" + j + "," + count + "]" + newString.Substring(index + replacement.Length);
  }
}

Upvotes: 1

Related Questions