Adams
Adams

Reputation: 9

Matching large number of chunk names periodically

We have a system which verify two locations have same data periodically (say 30 mins). We can assume that data is made of chunks and each chunk has uniq name. The way it currently match is that it query both locations and match them. Given there are lot of these chunks, system spends lot of time fetching chunk names from database and sending them over to matcher.

Is there something out there, I can use to optimize this and we do not need to send full list of chunk names each time.

If chunks were static, we can just compute crc32 and send that if its does not match, then we can query the chunks. But in our system chunks can be deleted or added anytime. So we need something like running checksum, which we can add / substract a chunk name. I thought about bloom filter but it will not work for us because it can generate false positives. We need to be sure.

Upvotes: 1

Views: 90

Answers (0)

Related Questions