Reputation: 1028

Checksum for an integer array?

I have an array that is of size 4,9,16 or 25 (according to the input) and the numbers in the array are the same but less by one (if the array size is 9 then the biggest element in the array would be 8) the numbers start with 0 and I would like to do some algorithm to generate some sort of a checksum for the array so that I can compare that 2 arrays are equal without looping through the whole array and checking each element one by one.

Where can I get this sort of information? I need something that is as simple as possible. Thank you.

edit: just to be clear on what I want:

-All the numbers in the array are distinct, so [0,1,1,2] is not valid because there is a repeated element (1)

-The position of the numbers matter, so [0,1,2,3] is not the same as [3,2,1,0]

-The array will contain the number 0, so this should also be taken into consideration.

EDIT:

Okay I tried to implement the Fletcher's algorithm here: http://en.wikipedia.org/wiki/Fletcher%27s_checksum#Straightforward

int fletcher(int array[], int size){
  int i;
  int sum1=0;
  int sum2=0;
  for(i=0;i<size;i++){
    sum1=(sum1+array[i])%255;
    sum2=(sum2+sum1)%255;
  }
  return (sum2 << 8) | sum1;
}

to be honest I have no idea what does the return line do but unfortunately, the algorithm does not work. For arrays [2,1,3,0] and [1,3,2,0] I get the same checksum.

EDIT2:

okay here's another one, the Adler checksum http://en.wikipedia.org/wiki/Adler-32#Example_implementation

#define MOD 65521;

unsigned long adler(int array[], int size){
  int i;
  unsigned long a=1;
  unsigned long b=0;
  for(i=0;i<size;i++){
    a=(a+array[i])%MOD;
    b=(b+a)%MOD;
  }
  return (b <<16) | a;
}

This also does not work. Arrays [2,0,3,1] and [1,3,0,2] generate same checksum. I'm losing hope here, any ideas?

Upvotes: 7

Answers (5)

imdadable

Reputation: 147

How about the checksum of weighted sum? Let's take an example for [0,1,2,3]. First pick a seed and limit, let's pick a seed as 7 and limit as 10000007.

a[4] = {0, 1, 2, 3}
limit = 10000007, seed = 7
result = 0
result = ((result + a[0]) * seed) % limit = ((0 + 0) * 7)) % 10000007 = 0
result = ((result + a[1]) * seed) % limit = ((0 + 1) * 7)) % 10000007 = 7
result = ((result + a[2]) * seed) % limit = ((7 + 2) * 7)) % 10000007 = 63
result = ((result + a[3]) * seed) % limit = ((63 + 3) * 7)) % 10000007 = 462

Your checksum is 462 for that [0, 1, 2, 3]. The reference is http://www.codeabbey.com/index/wiki/checksum

Upvotes: 1

Nicolas Repiquet

Reputation: 9265

Let's take the case of your array of 25 integers. You explain that it can contains any permutations of the unique integers 0 to 24. According to this page, there is 25! (25 factorial) possible permutations, that is 15511210043330985984000000. Far more than a 32bit integer can contains.

The conclusion is that you will have collision, no matter how hard you try.

Now, here is a simple algorithm that account for position:

int checksum(int[] array, int size) {
  int c = 0;
  for(int i = 0; i < size; i++) {
    c += array[i];
    c = c << 3 | c >> (32 - 3); // rotate a little
    c ^= 0xFFFFFFFF; // invert just for fun
  }
  return c;
}

Upvotes: 5

UmNyobe

Reputation: 22920

From what I understand your array contains a permutation of numbers from 0 to N-1. One check-sum which will be useful is the rank of the array in its lexicographic ordering. What does it means ? Given 0, 1, 2 You have the possible permutations

  1: 0, 1, 2
  2: 0, 2, 1
  3: 1, 0, 2
  4: 1, 2, 0
  5: 2, 0, 1
  6: 2, 1, 0

The check-sum will be the first number, and computed when you create the array. There are solutions proposed in

Find the index of a given permutation in the list of permutations in lexicographic order

which can be helpful, although it seems the best algorithm was of quadratic complexity. To improve it to linear complexity you should cache the values of the factorials before hand.

The advantage? ZERO collision.

EDIT: Computation The value is like the evaluation of a polynomial where factorial is used for the monomial instead of power. So the function is

f(x0,....,xn-1) = X0 * (0!) + X1 * (1!) + X2 * (2!) +...+ Xn-1 * (n-1!)

The idea is to use each values to get a sub-range of permutations, and with enough values you pinpoint an unique permutation.

Now for the implementation (like the one of a polynomial):

pre compute 0!.... to n-1! at the beginning of the program
Each time you set an array you use f(elements) to compute its checksum
you compare in O(1) using this checksum

Upvotes: 0

Philosophus42

Reputation: 472

I think what you want is in the answer of the following thread:

Fast permutation -> number -> permutation mapping algorithms

You just take the number your permutation is mapped to and take that as your Checksum. As there is exactly one Checksum per permutation there can't be a smaller Checksum that is collision free.

Upvotes: 1

Yusuf X

Reputation: 14633

For an array of N unique integers from 1 to N, just adding up the elements will always be N*(N+1)/2. Therefore the only difference is in the ordering. If by "checksum" you imply that you tolerate some collisions, then one way is to sum the differences between consecutive numbers. So for example, the delta checksum for {1,2,3,4} is 1+1+1=3, but the delta checksum for {4,3,2,1} is -1+-1+-1=-3.

No requirements were given for collision rates or computational complexity, but if the above doesn't suit, then I recommend a position dependent checksum

Upvotes: 0

Checksum for an integer array?

Answers (5)

Related Questions