Reputation: 53135
A friend asked me how to improve some code with LINQ. How would you do a character by character comparison between two strings to count the number of matches at an index? Here's the original code, could it be improved with LINQ?
private int Fitness(string individual, string target)
{
int sum = 0;
for (int i = 0; i < individual.Length; i++)
if (individual[i] == target[i]) sum++;
return sum;
}
Upvotes: 5
Views: 5138
Reputation: 5254
I am a little puzzled. The PO asked to improve the solution with Linq. Benchmarking the solutions I received the following result in the table (code see below). Unless I am missing something (in which case I'd appreciate comments) I cannot see that Linq improves the simple solution with the for-loop.
Result: the for loop is a lot faster and needs less memory.
Consequence: I would keep the simple for-loop. The argument of Conciseness that favors Linq does not apply here because the code is hidden away in a method. And the performance is not better but worse.
I like Linq. Tried to work without it, recently, and failed. But it is not always better.
Method | Mean | Error | StdDev | Gen 0 | Allocated |
---|---|---|---|---|---|
RunLinqZip | 1,179.16 ns | 2.362 ns | 1.844 ns | 0.1831 | 1,152 B |
RunLinqRange | 556.54 ns | 1.359 ns | 1.135 ns | 0.1726 | 1,088 B |
RunForLoop | 63.84 ns | 0.101 ns | 0.094 ns | - | - |
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
namespace Benchmarks
{
public class Program
{
static void Main(string[] args)
{
var summary = BenchmarkRunner.Run<MemoryBenchmarkerDemo>();
}
}
[MemoryDiagnoser]
public class MemoryBenchmarkerDemo
{
public string[] Id = new string[] { "ATTR", "TAL", "SPELL", "LITURGY", "REGENERATE", "INI", "CT_9", "CT_9/ITEMTEMPLATE_231" };
public string[] Att = new string[] { "ATTR_2", "TAL_99", "SPELL_232", "LITURGY_123", "REGENERATE", "INI", "CT_9", "CT_9/ITEMTEMPLATE_231" };
public int FitnessFor(string individual, string target)
{
int sum = 0;
for (int i = 0; i < Math.Min(individual.Length, target.Length); i++)
if (individual[i] == target[i]) sum++;
return sum;
}
public int FitnessRange(string individual, string target)
{
return Enumerable.Range(0, individual.Length)
.Count(i => individual[i] == target[i]);
}
public int FitnessZip(string individual, string target)
{
return individual.Zip(target).Count(pair => pair.Item1 == pair.Item2);
}
[Benchmark]
public void RunLinqZip()
{
for(int i = 0; i< Id.Length; i++)
{
FitnessZip(Id[i], Att[i]);
}
}
[Benchmark]
public void RunLinqRange()
{
for (int i = 0; i < Id.Length; i++)
{
FitnessRange(Id[i], Att[i]);
}
}
[Benchmark]
public void RunForLoop()
{
for (int i = 0; i < Id.Length; i++)
{
FitnessFor(Id[i], Att[i]);
}
}
}
}
Upvotes: 0
Reputation: 866
The currently selected answer doesn't handle different length strings well. It only compares a maximum substring of the input strings if the lengths differ.
For example:
Fitness("ABC", "ABC") -> 3
Fitness("ABC", "ABC this is an 'unfit' string") -> 3
This can be easily fixed by subtracting the differing lengths from your score. The improvement:
return Enumerable.Range(0, Math.Min(individual.Length, target.Length))
.Count(i => individual[i] == target[i])
- Math.Abs(individual.Length - target.Length);
Now:
Fitness("ABC", "ABC") -> 3
Fitness("ABC", "ABC this is an 'unfit' string") -> -23
Technically, there are 23 characters difference between those two inputs.
Upvotes: 0
Reputation: 422076
return Enumerable.Range(0, individual.Length)
.Count(i => individual[i] == target[i]);
A more fool-proof way would be (the above snippet will fail if target
is shorter than individual
):
return Enumerable.Range(0, Math.Min(individual.Length, target.Length))
.Count(i => individual[i] == target[i]);
I believe the code is correct as is. Enumerable.Range
method takes two arguments. The first of which is the start index (should be 0
), the second is the count of items. The complete code snippet to test and make sure:
class Program {
static void Main(string[] args) {
Console.WriteLine(Fitness("hello", "world"));
}
static int Fitness(string individual, string target) {
return Enumerable.Range(0, Math.Min(individual.Length, target.Length))
.Count(i => individual[i] == target[i]);
}
}
Upvotes: 7
Reputation: 120496
How about a join, or did I misunderstand?
static void Main(string[] args)
{
var s1 = "abcde";
var s2 = "hycdh";
var count = s1.Join(s2, c => c, c => c, (a,b) => a).Count();
Console.WriteLine(count);
Console.ReadKey();
}
Upvotes: -1
Reputation: 1063198
You could write something similar with LINQ, but since "Zip" isn't built in until .NET 4.0 it will be more code than we'd like, and/or not as efficient. I'd be tempted to leave it "as is", but I'd probably check target.Length
to avoid an out-of-range exception.
Perhaps I'd make an extension method, though:
public static int CompareFitness(this string individual, string target)
{
int sum = 0, len = individual.Length < target.Length
? individual.Length : target.Length;
for (int i = 0; i < len; i++)
if (individual[i] == target[i]) sum++;
return sum;
}
Then you can use:
string s = "abcd";
int i = s.CompareFitness("adc"); // 2
Upvotes: 2