Reputation: 67918
I am grouping log records by a RegEx pattern. After grouping them I'd like to get a Distinct
count of the records for each group. For this example, Distinct
is defined as the same visit key and the same year, month, day, hour, and minute.
It's just a way of getting a more accurate count of something getting logged all the way up the stack by different consumers.
Alright, so I'm grouping them like this:
var knownMessages = logRecords
.Where(record => !string.IsNullOrEmpty(record.InclusionPattern))
.GroupBy(record => new
{
MessagePattern = record.InclusionPattern
})
.Select(g => new KnownMessage
{
MessagePattern = g.Key.MessagePattern,
----> Count = g.Distinct().Count(),
Records = g.ToList()
})
.OrderByDescending(o => o.Count);
And GetHashCode
for the type is implemented like this:
public override int GetHashCode()
{
var visitKeyHash = this.VisitKey == null ?
251 : this.VisitKey.GetHashCode();
var timeHash = this.Time.Year + this.Time.Month + this.Time.Day +
this.Time.Hour + this.Time.Minute;
return ((visitKeyHash * 251) + timeHash) * 251;
}
But, for example, in the list I have three records that return the same hash code 1439926797
; I still get a count of 3
. I know it's leveraging GetHashCode
(as I expected) to do the comparison because I have a breakpoint there to see what the hash code is.
What did I miss?
Upvotes: 1
Views: 440
Reputation: 127603
First let me repeat what I said in my comment.
The logic is : If a.GetHashcode() != b.GetHashCode()
then a != b,
If a.GetHashCode() == b.GetHashCode() && a.Equals(b)
then a == b
, All GetHashcode()
does for you is lets you skip the Equals()
check if you have two different values. That is why you need to implement both, If you only implement Equals()
then the a.GetHashCode() == b.GetHashCode()
step fails and it never tries the Equals() you implemented.
GetHashCode()
should be fast and it's value should not change while it sits in a collection that depends on it's value. So don't modify VisitKey
nor Time
if you are storing these inside a Dictionary
or HashSet
or similar.
So all you need to do is:
public override int GetHashCode()
{
var visitKeyHash = this.VisitKey == null ?
251 : this.VisitKey.GetHashCode();
var timeHash = this.Time.Year + this.Time.Month + this.Time.Day +
this.Time.Hour + this.Time.Minute;
return ((visitKeyHash * 251) + timeHash);
}
public override bool Equals(object obj)
{
//Two quick tests before we start doing all the math.
if(Object.ReferenceEquals(this, obj))
return true;
KnownMessage message = obj as KnownMessage;
if(Object.ReferenceEquals(message, null)))
return false;
return this.VisitKey.Equals(message.VisitKey) &&
this.time.Year.Equals(message.Time.Year) &&
this.time.Month.Equals(message.Time.Month) &&
this.time.Day.Equals(message.Time.Day) &&
this.time.Hour.Equals(message.Time.Hour) &&
this.time.Minute.Equals(message.Time.Minute);
}
Upvotes: 2
Reputation: 113352
You don't give your Equals
override. As with other hash-based collections like Dictionary
and HashSet
, the internal structure used by Distinct()
uses GetHashCode()
to select a hash to store by, but Equals
to determine actual equality.
The problem could be either a bug in your Equals
or in your GetHashCode
, but in the later case is that it doesn't correctly match your Equals
(GetHashCode must return the same hash for two objects for which Equals
returns true, but can of course also return the same for two different objects), which makes it a bug in the pair of methods. So either way, the problem is directly or indirectly in your override of Equals
.
Upvotes: 2
Reputation: 203821
It seems you have not overridden the Equals
method to use the same definition of equality as your hash code generation algorithm. Since that is used to resolve hash collisions, it is important that the two always be in sync.
Upvotes: 2