Reputation: 1630
I have a model class Class1
and I want to compare if two instances of Class1
are same (structural equality).
public class Class1 : IEquatable<Class1>
{
public string Id { get; set; }
public string Name { get; set; }
public IList<Class2> Class2s { get; set; }
public bool Equals(Class1 other)
{
return QuestName.Equals(other.QuestName)
&& Class2s.OrderBy(c => c.Id).SequenceEqual(other.Class2s.OrderBy(c => c.Id));
//Below method is very fast but not so accurate
//because 2 objects with the same hash code may or may not be equal
//return GetHashCode() == other.GetHashCode();
}
public override bool Equals(object obj)
{
return obj is Class1
&& this.Equals(obj as Class1);
}
public override int GetHashCode()
{
unchecked
{
int hash = 13;
hash = (hash * 7) + Name.GetHashCode();
foreach (var c2 in Class2s.OrderBy(c => c.Id))
{
hash = (hash * 7) + c2.GetHashCode();
}
return hash;
}
}
}
public class Class2 : IEquatable<Class2>
{
public int Id { get; set; }
public string Name { get; set; }
public IList<Class3> Class3s { get; set; }
public bool Equals(Class2 other)
{
return Id == other.Id
&& Name.Equals(other.Name)
&& Class3s.OrderBy(c => c.Id).SequenceEqual(other.Class3s.OrderBy(c => c.Id));
}
public override bool Equals(object obj)
{
return obj is Class2
&& this.Equals(obj as Class2 );
}
public override int GetHashCode()
{
unchecked
{
int hash = 13;
hash = (hash * 7) + Id.GetHashCode();
hash = (hash * 7) + Name.GetHashCode();
foreach (var c3 in Class3s.OrderBy(c => c.Id))
{
hash = (hash * 7) + c3.GetHashCode();
}
return hash;
}
}
}
public class Class3 : IEquatable<Class3>
{
public int Id { get; set; }
public string Name { get; set; }
public IList<Class4> Class4s { get; set; }
public bool Equals(Class3 other)
{
return Id == other.Id
&& Name.Equals(other.Name)
&& Class4s.OrderBy(c => c.Id).SequenceEqual(other.Class4s.OrderBy(c => c.Id));
}
public override bool Equals(object obj)
{
return obj is Class3
&& this.Equals(obj as Class3);
}
public override int GetHashCode()
{
unchecked
{
int hash = 13;
hash = (hash * 7) + Id.GetHashCode();
hash = (hash * 7) + Name.GetHashCode();
foreach (var c in Class4s.OrderBy(c => c.Id))
{
hash = (hash * 7) + c.GetHashCode();
}
return hash;
}
}
}
public class Class4 : IEquatable<Class4>
{
public int Id { get; set; }
public string Name { get; set; }
public bool Equals(Class4 other)
{
return Id.Equals(other.Id)
&& Name.Equals(other.Name);
}
public override bool Equals(object obj)
{
return obj is Class4
&& this.Equals(obj as Class4);
}
public override int GetHashCode()
{
unchecked
{
int hash = 13;
hash = (hash * 7) + Id.GetHashCode();
hash = (hash * 7) + Name.GetHashCode();
return hash;
}
}
}
I say that two Class1
objects are equal when:
1. they have the same Name
2. they have the same Class2
objects (their order does not matter)
Two Class2
objects are equal:
1. They have the same Id
2. They Have the same name
3. they have the same Class3
objects (their order does not matter)
Two Class3
objects are equal:
1. They have the same Id
2. They Have the same name
3. they have the same Class4
objects (their order does not matter)
Two Class4
objects are equal:
1. They ave the same Id
2. They Have the same name
I compare them using the Equals
method and measure the run time like this:
Class1 obj1 = GetFirstClass1Object();
Class1 obj2 = GetSecondClass1Object();
var startTime = DateTime.Now;
bool equals = obj1.Equals(obj2);
var elaspedTime = DateTime.Now.Substract(startTime)
the above solution works just fine but it's very slow.
I know that if we flatten obj1
and obj2
, they contain 3500 Class4
objects each and it takes around 12 seconds to compare obj1
and obj2
.
Is there any faster way to do this? Can I somehow make use of hashing to make this faster?
Also, the number of Class2
, Class3
and Class4
objects inside both obj1
and obj2
will always be same
Upvotes: 2
Views: 237
Reputation: 247591
Consider the following structure given the provided classes as examples. There was no sample data based on your example to test it against so you will have to test it with what you have.
public class Class1 : IEquatable<Class1> {
public int Id { get; set; }
public string Name { get; set; }
public IList<Class2> Class2s { get; set; }
public static bool operator ==(Class1 left, Class1 right) {
return Equals(left, right);
}
public static bool operator !=(Class1 left, Class1 right) {
return !(left == right);
}
public bool Equals(Class1 other) {
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return string.Equals(this.ToString(), other.ToString());
}
public override bool Equals(object obj) {
return obj is Class1 other && this.Equals(other);
}
public override int GetHashCode() {
return ToString().GetHashCode();
}
public override string ToString() {
var cs = Class2s == null ? "" : string.Join("", Class2s.OrderBy(_ => _.Id).Select(_ => _.ToString()));
return string.Join("", Id, Name, cs);
}
}
public class Class2 : IEquatable<Class2> {
public int Id { get; set; }
public string Name { get; set; }
public IList<Class3> Class3s { get; set; }
public static bool operator ==(Class2 left, Class2 right) {
return Equals(left, right);
}
public static bool operator !=(Class2 left, Class2 right) {
return !(left == right);
}
public bool Equals(Class2 other) {
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return string.Equals(this.ToString(), other.ToString());
}
public override bool Equals(object obj) {
return obj is Class2 other && this.Equals(other);
}
public override int GetHashCode() {
return ToString().GetHashCode();
}
public override string ToString() {
var cs = Class3s == null ? "" : string.Join("", Class3s.OrderBy(_ => _.Id).Select(_ => _.ToString()));
return string.Join("", Id, Name, cs);
}
}
public class Class3 : IEquatable<Class3> {
public int Id { get; set; }
public string Name { get; set; }
public IList<Class4> Class4s { get; set; }
public static bool operator ==(Class3 left, Class3 right) {
return Equals(left, right);
}
public static bool operator !=(Class3 left, Class3 right) {
return !(left == right);
}
public bool Equals(Class3 other) {
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return string.Equals(this.ToString(), other.ToString());
}
public override bool Equals(object obj) {
return obj is Class3 other && this.Equals(other);
}
public override int GetHashCode() {
return ToString().GetHashCode();
}
public override string ToString() {
var cs = Class4s == null ? "" : string.Join("", Class4s.OrderBy(_ => _.Id).Select(_ => _.ToString()));
return string.Join("", Id, Name, cs);
}
}
public class Class4 : IEquatable<Class4> {
public int Id { get; set; }
public string Name { get; set; }
public static bool operator ==(Class4 left, Class4 right) {
return Equals(left, right);
}
public static bool operator !=(Class4 left, Class4 right) {
return !(left == right);
}
public bool Equals(Class4 other) {
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return string.Equals(this.ToString(), other.ToString());
}
public override bool Equals(object obj) {
return obj is Class4 other && Equals(other);
}
public override int GetHashCode() {
return ToString().GetHashCode();
}
public override string ToString() {
return string.Format("{0}{1}", Id, Name);
}
}
The structure for all the objects are similar except for Class4
obviously, since it has no inner list.
Though just an example, a lot of the repeated code could be refactored into a common base class.
Upvotes: 1
Reputation: 977
I've done some BenchmarkDotNet benchmarks on your code and ideas I've had to optimize your code.
For each test, I've created 1 instance of Class1
, which has 150 children of type Class2
, each of which has 150 children of type Class3
, each of which has 150 children of type Class4
.
I've measured comparing an object to itself because comparing different objects is going to be much faster since any comparison that returns false shortcuts the whole thing. Also, there are no ReferenceEquals()
shortcuts, hence I didn't bother cloning the object.
| Method | Mean | Error | Ratio |
|----------------------------------------------------------------------- |------------:|------:|------:|
| 'Original code' | 535.46 ms | NA | 1.00 |
| 'Custom dictionary-based SequenceEquals' | 6,606.23 ms | NA | 12.34 |
| 'Custom dictionary-based SequenceEquals, classes cache their HashCode' | 1,136.91 ms | NA | 2.12 |
| 'Custom Except()-based SequenceEquals' | 2,281.12 ms | NA | 4.26 |
| 'Custom Except()-based SequenceEquals, classes cache their HashCode' | 257.46 ms | NA | 0.48 |
| 'No OrderBy()' | 76.31 ms | NA | 0.14 |
Original code
: It's your code. I'm using it as a baseline for comparison.Custom dictionary-based SequenceEquals
: Then, I've tried to optimize list equality comparison. First, I've tried a Dictionary
solution inspired by this answer. Turns out, it was 12 times slower, because Dictionary
has to frequently calculate hashcode, and hashcode in our case means iterating over children and nested children.Custom dictionary-based SequenceEquals, classes cache their HashCode
: I figured it's possible to do better if I start to cache the hashcode. The Dictionary
-based solution is now only twice as slow as the original.Custom Except()-based SequenceEquals
: Then, there's the Except()
method. Behind the scenes, it creates something like a HashSet. As far as I understand, it only needs to calculate hashcode once for each element of both enumerable. That solution takes 4.26x time that the original does.Custom Except()-based SequenceEquals, classes cache their HashCode
: Same as previously, I start caching hashcode thus only really calculating it once for every object. The resulting solution takes 0.48x time of the original. Not bad.No OrderBy()
: Then I've stopped using the OrderBy()
, just SequenceEquals()
, and given that I'm comparing an object to itself, you could say that the data is already sorted, so it's safe to compare like that :-). The resulting solution is a massive speedup taking 0.14x time of the original.Your best option is reviewing your model and requirements, do you really need to compare huge object graphs like this? If you really have to:
Except()
-based comparison. Be careful, as Set-based solution assumes you don't care about duplicates, you'll have to compare list Count
's prior to Except()
.;OrderBy()
and use plain SequenceEquals()
comparison. It's a tradeoff because inserts are going to be more expensive. See, if this is applicable to your scenario.Uploaded my code and measurements to this repo.
Upvotes: 4
Reputation: 19702
Sorting the lists just to compare them seems pretty inefficient to me. You could try using other methods to compare the lists
Instead of
Class2s.OrderBy(c => c.Id).SequenceEqual(other.Class2s.OrderBy(c => c.Id)
You could try something like
!Class2s.Except(other.Class2s).Any()
If most of the objects are not equal, you could also add an extra test to be sure the lists are not looped when their size is not the same:
Class2s.Count == other.Class2s.Count && !Class2s.Except(other.Class2s).Any()
Of course, you can do the same for the Class2.Equals() and Class3.Equals methods as well.
Upvotes: 0