Tejas Vora
Tejas Vora

Reputation: 548

Redis: Implementing fixed Size Set/List to check duplicates

I would like to implement following using Redis:

  1. We have incoming activities. Each activity has unique Id
  2. We would like to check if incoming activity is duplicate using it's Id
  3. We only want to maintain last X number of activity Ids. X number is not a hard-coded requirement. Meaning having couple of extra Ids is not a problem.
  4. Each incoming activity also has time-stamp.

Not sure how to implement this.

  1. I can either use List - using LPUSH and LTRIM - I can maintain fixed size - however, I would not be able to check for duplicate easily.
  2. I can use SET - which allows me to check for duplicate very easily before processing incoming activity - however, I am not sure how to limit the size of a SET.

Which data structure I should use - which would allow me to easily check for duplicate and also maintain the fix size.

I am using StackExchange.Redis library.

Upvotes: 2

Views: 896

Answers (1)

Itamar Haber
Itamar Haber

Reputation: 50122

TL;DR neither - use the Sorted Set instead and set the elements' (activity ids) score to the timestamp (i.e. Unix epoch).

As you've pointed out, Lists are a bad choice for detecting duplicates because you'll be doing it in O(N) complexity. Sets are perfect for exact duplicate detection (actually, Hashes can be used as well), and you can call SPOP with count of Y whenever the set's cardinality (SCARD) exceeds your X, or in pseudo code:

y = SCARD key - x
if y > 0 then
  SPOP key y
end

However, SPOP is random and you've mentioned that you have a timestamp. In many cases, it is more practical to limit the set's size by retaining only the newest elements instead of being undeterministic. For that, use the Sorted Set and ZREMRANGEBYRANK will let you keep the set's cardinality (ZCARD) at an arbitrary X value by throwing away the oldest activities. Note that ranking is from lowest to highest so you'll need use a rank range from from 0 (the first, lowest-scored element) -X (the Xth+1 element from the last, highest-scored element), i.e. only this:

  ZREMRANGEBYRANK key 0 -x

Upvotes: 4

Related Questions