Jon Galloway
Jon Galloway

Reputation: 53135

Count matching characters between two strings using LINQ

A friend asked me how to improve some code with LINQ. How would you do a character by character comparison between two strings to count the number of matches at an index? Here's the original code, could it be improved with LINQ?

private int Fitness(string individual, string target)
   {
       int sum = 0;
       for (int i = 0; i < individual.Length; i++)
           if (individual[i] == target[i]) sum++;
       return sum;
   }

Upvotes: 5

Views: 5138

Answers (5)

Jan
Jan

Reputation: 5254

I am a little puzzled. The PO asked to improve the solution with Linq. Benchmarking the solutions I received the following result in the table (code see below). Unless I am missing something (in which case I'd appreciate comments) I cannot see that Linq improves the simple solution with the for-loop.

Result: the for loop is a lot faster and needs less memory.

Consequence: I would keep the simple for-loop. The argument of Conciseness that favors Linq does not apply here because the code is hidden away in a method. And the performance is not better but worse.

I like Linq. Tried to work without it, recently, and failed. But it is not always better.

Method Mean Error StdDev Gen 0 Allocated
RunLinqZip 1,179.16 ns 2.362 ns 1.844 ns 0.1831 1,152 B
RunLinqRange 556.54 ns 1.359 ns 1.135 ns 0.1726 1,088 B
RunForLoop 63.84 ns 0.101 ns 0.094 ns - -
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;


namespace Benchmarks
{
    public class Program
    {
        static void Main(string[] args)
        {
            var summary = BenchmarkRunner.Run<MemoryBenchmarkerDemo>();
        }
    }


    [MemoryDiagnoser]
    public class MemoryBenchmarkerDemo
    {
        public string[] Id = new string[] { "ATTR", "TAL", "SPELL", "LITURGY", "REGENERATE", "INI", "CT_9", "CT_9/ITEMTEMPLATE_231" };
        public string[] Att = new string[] { "ATTR_2", "TAL_99", "SPELL_232", "LITURGY_123", "REGENERATE", "INI", "CT_9", "CT_9/ITEMTEMPLATE_231" };


        public int FitnessFor(string individual, string target)
        {
            int sum = 0;
            for (int i = 0; i < Math.Min(individual.Length, target.Length); i++)
                if (individual[i] == target[i]) sum++;
            return sum;
        }

        public int FitnessRange(string individual, string target)
        {
            return Enumerable.Range(0, individual.Length)
                     .Count(i => individual[i] == target[i]);
        }

        public int FitnessZip(string individual, string target)
        {
            return individual.Zip(target).Count(pair => pair.Item1 == pair.Item2);
        }



        [Benchmark]
        public void RunLinqZip()
        {
            for(int i = 0; i< Id.Length; i++)
            {
                FitnessZip(Id[i], Att[i]);
            }
        }

        [Benchmark]
        public void RunLinqRange()
        {
            for (int i = 0; i < Id.Length; i++)
            {
                FitnessRange(Id[i], Att[i]);
            }
        }


        [Benchmark]
        public void RunForLoop()
        {
            for (int i = 0; i < Id.Length; i++)
            {
                FitnessFor(Id[i], Att[i]);
            }
        }

    }
}

Upvotes: 0

jamesrom
jamesrom

Reputation: 866

The currently selected answer doesn't handle different length strings well. It only compares a maximum substring of the input strings if the lengths differ.

For example:

Fitness("ABC", "ABC") -> 3
Fitness("ABC", "ABC this is an 'unfit' string") -> 3

This can be easily fixed by subtracting the differing lengths from your score. The improvement:

return Enumerable.Range(0, Math.Min(individual.Length, target.Length))
                 .Count(i => individual[i] == target[i])
                 - Math.Abs(individual.Length - target.Length);

Now:

Fitness("ABC", "ABC") -> 3
Fitness("ABC", "ABC this is an 'unfit' string") -> -23

Technically, there are 23 characters difference between those two inputs.

Upvotes: 0

Mehrdad Afshari
Mehrdad Afshari

Reputation: 422076

return Enumerable.Range(0, individual.Length)
                 .Count(i => individual[i] == target[i]);

A more fool-proof way would be (the above snippet will fail if target is shorter than individual):

return Enumerable.Range(0, Math.Min(individual.Length, target.Length))
                 .Count(i => individual[i] == target[i]);

I believe the code is correct as is. Enumerable.Range method takes two arguments. The first of which is the start index (should be 0), the second is the count of items. The complete code snippet to test and make sure:

class Program {
  static void Main(string[] args) {
      Console.WriteLine(Fitness("hello", "world"));
  }
  static int Fitness(string individual, string target) {
      return Enumerable.Range(0, Math.Min(individual.Length, target.Length))
                       .Count(i => individual[i] == target[i]);
  }
}

Upvotes: 7

spender
spender

Reputation: 120496

How about a join, or did I misunderstand?

    static void Main(string[] args)
    {
        var s1 = "abcde";
        var s2 = "hycdh";
        var count = s1.Join(s2, c => c, c => c, (a,b) => a).Count();
        Console.WriteLine(count);
        Console.ReadKey();
    }

Upvotes: -1

Marc Gravell
Marc Gravell

Reputation: 1063198

You could write something similar with LINQ, but since "Zip" isn't built in until .NET 4.0 it will be more code than we'd like, and/or not as efficient. I'd be tempted to leave it "as is", but I'd probably check target.Length to avoid an out-of-range exception.

Perhaps I'd make an extension method, though:

public static int CompareFitness(this string individual, string target)
{
   int sum = 0, len = individual.Length < target.Length
        ? individual.Length : target.Length;
   for (int i = 0; i < len; i++)
       if (individual[i] == target[i]) sum++;
   return sum;
}

Then you can use:

string s = "abcd";
int i = s.CompareFitness("adc"); // 2

Upvotes: 2

Related Questions