Chemary
Chemary

Reputation: 1333

Redis: best way to count occurrence of item in sets

I have a list of 40K items that may be present on 3K sets, I want to count in how many sets each of this items are present.

The simple algorithm in Python and omitting some pipeline optimizations for simplicity is this:

ids = [1,2,3,4,5]
set1 = (1,3)
set2 = (2,3)
set3 = (4,5)
sets = [set1, set2, set3]

ids_count = {}
for id in ids:
    ids_count[id] = sum([redis.sismember(id, set) for set in sets])

It will need 120M redis calls or 3K using pipelining, both are really slow. There is any better way to do it without changing how data is stored (I already have the list of ids and the list of sets on redis)

Upvotes: 0

Views: 361

Answers (1)

Emilia Bopp
Emilia Bopp

Reputation: 883

I think the most efficient way is to download the whole thing (all sets and all ids, which you'll end up doing anyway, as you describe it) and then do everything in memory. Furthermore your algorithm is likely going to be more efficient iterating over sets without the membership check, like so:

# ...
for set in sets:
    for id in set:
        if id not in ids_count:
            ids_count[id] = 0
        ids_count[id] += 1

This is assuming your performance is network-bound. But you have to benchmark this to be sure.

Upvotes: 1

Related Questions