Reputation: 1
background: I have two arrays of objects and I want to know if these two arrays are same. Can I calculate md5 of all objects and sum them up to do one comparison?
basically, if I know md5(a)=md5(b), then practically a=b
but if I know md5(a) + md5(b) = md5(c) + md5(d), can I say a=c and b=d ?
Upvotes: 0
Views: 353
Reputation: 13196
I have two arrays of objects and I want to know if these two arrays are same. Can I calculate md5 of all objects and sum them up to do one comparison?
Is this an optimization?
If you only have 2 arrays you won't benefit from this since calculating the hash of a block of memory is an O(n) operation. If you're going to compare the array once, a naive approach of simply comparing the length and each element will be faster.
but if I know md5(a) + md5(b) = md5(c) + md5(d), can I say a=c and b=d?
No. If you need a fast way to eliminate a huge number of cases, you might use it as an initial guess, since for a == c
and b == d
to be true, md5(a) == md5(c)
and md5(b) == md5(d)
will also necessarily be true. However, it's not a certainty: There exist circumstances in which the md5 check will succeed but the arrays will not be equal. If you decide to rely on this check, you will need to ensure that you weed out such false positives.
Also, order becomes irrelevant if you take the sum. In other words, you end up with a few different situations in which the sums may be equal:
a == c
and b == d
(expected)a == d
and b == c
(values are swapped)a != c
and a != d
and b != c
and b != d
(coincidental false positive)You will need to account for all of these for this heuristic to be useful.
Upvotes: 1