Abhinav
Abhinav

Reputation: 1740

How to generate rolling checksum for overlapping chunks?

I came recently across Rsync algorithm and thought of implementing it using java.One of the important part of this algorithm is Rolling Checksum at the Sender's Side.

In http://en.wikipedia.org/wiki/Rsync ,its explained that

"if one had already calculated the rolling checksum of bytes 1–25, one could calculate >the rolling checksum of bytes 2–26 solely from the previous checksum (R), byte 1 (n), and >byte 26 (n+S)."

I can generate checksum for a file or a string using MD5 or SHA.But I wanted light on this line as how can we implement it.

Upvotes: 2

Views: 799

Answers (1)

LoneRanger
LoneRanger

Reputation: 61

Let's assume that your rolling window covers 3 bytes, and that our input string is of 5 bytes. Consider the string 23456. We'll use a simple hash function : if the window covers bytes a,b and c, then the hash is a x 100 + b x 10 + c.

So, for our input string 2345, the checksum of the first 3 bytes is 2 x 100 + 3 x 10 + 4 = 234.

Next, the window moves one step to the left, covering 3, 4 and 5 now. Instead of calculating 3 x 100 + 4 x 10 + 5, we can use the previous checksum and our knowledge of the numbers that have just entered and left the window, 5 and 2 respectively.

So, we know that 2 has just left the window, we subtract 2 x 100 from 234, getting 34. Multiply 34 by 10 and add 5. This gives us the new hash, 345, without having to iterate over all the elements present in the new window. For the next sequence of bytes, we can use the same method and avoid calculating the hash value by iterating over all the bytes in the window.

Upvotes: 4

Related Questions