Colin Desmond
Colin Desmond

Reputation: 4854

Checksumming objects in memory

Lets say I have a class A that inherits from class B in C#. Class B has a property on it called Checksum which, when called at runtime, is to calculate the checksum of all the properties on an instance of class A (the particualr checksum algorithm used is not important, one from the BCL probably).

Importantly, the checksum algorithm must ignore the checksum property otherwise it will fail when validated later (as the checksum value will have changed).

So, as far as I can see it, there are two options:

1) Iterate over all the public properties of the object using reflection, concatenate into a string and checksum that.

2) Pretend that the object is simply a bunch of contiguous memeory addresses and treat that as a byte array and checksum that.

1 - sounds slow 2 - sounds difficult as I am not sure how you're get it to ignore the string that represents the checksum itself, or how references to other objects are handled.

Does anyone have any better ideas than 1 which sounds like the better of these two solutions?

Upvotes: 2

Views: 3720

Answers (3)

Paul Ruane
Paul Ruane

Reputation: 38580

Why does it have to be a property? If it were a method, GetChecksum() then you would not have to have any special logic so that it does not include itself in the checksum calculation. Now, what you have created is pretty much exactly the same as what the existing GetHashCode() method is for — just provide an implementation of this instead.

Typically one would code the GetHashCode() for each class explicitly although a quick web search will reveal approaches that use reflection to provide a generic (though slower) mechanism. Ususally one would take each field one wants to include the in the hashcode, convert it to an integer and multiply it by a fixed number such that the different objects with different values for the fields give different hashcodes that are well spread across the integer range.

As an example, Resharper generates GetHashCode() methods that look like this:

public override int GetHashCode()
{
    unchecked
    {
        int result = a;
        result = (result * 397) ^ (b != null ? b.GetHashCode() : 0);
        result = (result * 397) ^ c.GetHashCode();
        return result;
    }
}

Where a is an int, b is a string and c is a long. The interim value (result) is mulitplied by 397 and put to the power of the next component's hashcode at each step. The unchecked means that if the integer is overflowed (which is likely) then we discard the overflow and wrap around. This should give a reasonable coverage of the integer space in most cases — though I would recommend testing the coverage as a poor hashcode can have serious consequences on the performance of your system.

Care should be taken to handle zeroes of any field so that you do not multiply by zero and end up with a large number of objects that all have a zero hash-code.

Upvotes: 2

Niki
Niki

Reputation: 15867

Option 3 would be to create a method on-the-fly that calculates the checksum of all properties, e.g. by using reflection.emit. This is only inefficient for the first call, but the generated method can be cached. If you know which types have to be checksummed, you could also use code-generation to create checksum-methods for them at compile time.

Upvotes: 1

Giorgi
Giorgi

Reputation: 30873

You can decorate the checksum property as NonSerialized and serialize the instance of class to byte array, then compute checksum. This way the property will be ignored while serialization.

Upvotes: 5

Related Questions