Reputation: 624
I have two string arrays, newArray and oldArray, and I want to use Enumberable.Except method to remove all items that are in newArray that are also in oldArray and then write the result to a csv file.
However, I need to use a custom comparer in order to check for formatting similarities(if there is a new line character in one array and not the other, I don't want this item being written to the file).
My code as of now:
string newString = File.ReadAllText(csvOutputFile1);
string[] newArray = newString.Split(new string[] {sentinel}, StringSplitOptions.RemoveEmptyEntries);
string oldString = File.ReadAllText(csvOutputFile2);
string[] oldArray = oldString.Split(new string[] { sentinel }, StringSplitOptions.None);
IEnumerable<string> differnceQuery = newArray.Except(oldArray, new Comparer());
using (var wtr = new StreamWriter(diffFile))
{
foreach (var s in differnceQuery)
{
wtr.WriteLine(s.Trim() + "#!#");
}
}
and the custom comparer class:
class Comparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
x = x.ToString().Replace(" ", "").Replace("\n", "").Replace("\r", "");
y = y.ToString().Replace(" ", "").Replace("\n", "").Replace("\r", "");
if (x == y)
return true;
else
return false;
}
public int GetHashCode(string row)
{
int hCode = row.GetHashCode();
return hCode;
}
}
The resulting file is not omitting the formatting difference items between the two arrays. So although it catches items that are in the newArray but not in the oldArray(like it should), it is also putting in items that are only different because of a \n or something even though in my custom comparer I am removing them.
The thing I really don't understand is when I debug and step through my code, I can see each pair of items being analyzed in my custom comparer class, but only when they are equal terms. If for example the string "This is\nthe 1st term" is in newArray and the string "This is the first array" is in oldArray, the debugger doesn't even enter the comparer class and instead jumps straight to the writeline part of my code in the main class.
Upvotes: 1
Views: 500
Reputation: 1063874
simply: your hash-code does not correctly mirror your equality method. Strings like "a b c"
and "abc"
would return different values from GetHashCode
, so it would never get around to testing Equals
. GetHashCode
must return the same result for any two values that could be equal. It is not, however, necessary that two strings that are not equal return different hash-codes (although it is highly desirable, otherwise everything will go into the same hash-bucket).
I guess you could use:
// warning: probably not very efficient
return x.Replace(" ", "").Replace("\n", "").Replace("\r", "").GetHashCode();
but that looks pretty expensive (lots of potential for garbage strings to be generated all the time)
Upvotes: 3