Ivan Zlatanov
Ivan Zlatanov

Reputation: 5226

An efficient solution to a String.Replace problem?

Say I have an array of Strings declared like this:

String[] strings = new String[ 1024 ];

And given an array of key value pairs declared like this:

KeyValuePair<String, String>[] pairs = new KeyValuePair<String, String>[ 50 ];

What are the possible workarounds for a solution that would do this:

for ( int i = 0; i < 1024; ++i )
{
  foreach ( var kv in pairs )
  {
      strings[ i ] = strings[ i ].Replace( kv.Key, kv.Value );
  }
}

This code is only arbitrary and just for showing what the actual problem is. Given a lot of key values, which are known at compile time, how can I do an efficient String.Replace, and possibly reduce the method calls and copying the String around ( each call to String.Replace will produce a new immutable String, not very efficient when having a lot of those replace calls )?

Upvotes: 4

Views: 1640

Answers (5)

You can use the advices received (use StringBuilder for example) and with Parallel extensions, use how many cores you have in your machine to do the work in parallel.

Look at this code:

class Program {

    static void Main(String[] args) {

        // Filling the data
        List<KeyValuePair<String, String>> map = new List<KeyValuePair<String, String>>();
        List<StringBuilder> strings = new List<StringBuilder>();
        List<StringBuilder> strings2 = new List<StringBuilder>();

        for (Int32 i = 0; i < 50; i++) {
            String key = String.Format("[KEY{0}]", i);
            String value = String.Format("Text of KEY{0}", i);
            KeyValuePair<String, String> keyValuePair = new KeyValuePair<String, String>(key, value);
            map.Add(keyValuePair);
        }

        for (Int32 i = 0; i < 1024; i++) {
            StringBuilder text = new StringBuilder();

            foreach (KeyValuePair<String, String> keyValuePair in map) {
                text.AppendFormat("Some text before - {0} - Some text after.", keyValuePair.Key);
                text.AppendLine();
            }

            strings.Add(text);
            strings2.Add(text);
        }


        // Measuring the normal loop
        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();

        foreach (StringBuilder text in strings) {
            foreach (KeyValuePair<String, String> eachMap in map) {
                text.Replace(eachMap.Key, eachMap.Value);
            }
        }

        stopwatch.Stop();

        Console.WriteLine("Time with normal loop: {0}", stopwatch.Elapsed);


        // Measuring the parallel loop
        stopwatch.Reset();
        stopwatch.Start();

        Parallel.ForEach(strings2, text => {
            foreach (KeyValuePair<String, String> eachMap in map) {
                text.Replace(eachMap.Key, eachMap.Value);
            }
        });

        stopwatch.Stop();

        Console.WriteLine("Time with parallel: {0}", stopwatch.Elapsed);
        Console.ReadLine();
    }
}

And look at some measures running on my nootebook (AMD Turion64 X2 - 2 cores):



Time with normal loop: 00:00:03.5956428
Time with parallel: 00:00:01.8707367

Time with normal loop: 00:00:02.1467821
Time with parallel: 00:00:01.4627365

Time with normal loop: 00:00:03.4123084
Time with parallel: 00:00:01.6704408



Hope this helps.

Ricardo Lacerda Castelo Branco

Upvotes: 1

Joe
Joe

Reputation: 42607

It doesn't help with the number of loops, but if you use a StringBuilder as an intermediate it has a .Replace call with the same set of parameter signatures.

Edit:

Not sure if it's faster, but you can use Regex.Replace with an evaluator delegate.

If you build a search regex with your keys: (key1|key2|key3|key4...)

and then pass in the delegate to .Replace, you can return a lookup based on the Match's Value property.

  public string ReplaceData(Match m)
  {
      return pairs[m.Value];         
  }

...

  pairs.Add("foo","bar");
  pairs.Add("homer","simpson");
  Regex r = new Regex("(?>foo|homer)");
  MatchEvaluator myEval = new MatchEvaluator(class.ReplaceData);
  string sOutput = r.Replace(sInput, myEval);

Upvotes: 2

Dan Tao
Dan Tao

Reputation: 128317

I think what Joe and Agent_9191 are suggesting is essentially the same:

const int NUM_STRINGS = 1024;
string[] strings = new string[NUM_STRINGS];
StringBuilder[] stringBuilders = new StringBuilder[NUM_STRINGS];

// ensure that all StringBuilder objects are initialized
for (int i = 0; i < NUM_STRINGS; i++) {
    stringBuilders[i] = new StringBuilder(strings[i]);
}

KeyValuePair<string, string>[] pairs = new KeyValuePair<string, string>[50];

for (int i = 0; i < NUM_STRINGS; i++) {
    foreach (var kv in pairs) {
         // StringBuilder is mutable;
         // this actually modifies the instance
         stringBuilders[i].Replace(kv.Key, kv.Value);
    }

    strings[i] = stringBuilders[i].ToString();
}

Upvotes: 0

Stan R.
Stan R.

Reputation: 16065

This should do less string manipulation by selecting the values

 String[] strings = new String[1024];
 KeyValuePair<String, String>[] pairs = new KeyValuePair<String, String>[ 50 ];

 String[] replaced = strings.Select(x => 
                                         pairs.Any( y => y.Key == x ) ? 
                                         pairs.Where( z => z.Key == x).Select( val =>  val.Value).First() :  x )
                            .ToArray();

Upvotes: 0

Agent_9191
Agent_9191

Reputation: 7253

I'd say dump the strings into a List and then with each of the stringbuilders perform the replace calls. That will save on the creation of extra immutable string objects. Most likely a little overhead if you're only going to have a small number of replacements, but since you've stated there will be a lot of them then this should help.

Upvotes: 2

Related Questions