Redis: best way to count occurrence of item in sets

Question

I have a list of 40K items that may be present on 3K sets, I want to count in how many sets each of this items are present.

The simple algorithm in Python and omitting some pipeline optimizations for simplicity is this:

ids = [1,2,3,4,5]
set1 = (1,3)
set2 = (2,3)
set3 = (4,5)
sets = [set1, set2, set3]

ids_count = {}
for id in ids:
    ids_count[id] = sum([redis.sismember(id, set) for set in sets])

It will need 120M redis calls or 3K using pipelining, both are really slow. There is any better way to do it without changing how data is stored (I already have the list of ids and the list of sets on redis)

Emilia Bopp · Accepted Answer

I think the most efficient way is to download the whole thing (all sets and all ids, which you'll end up doing anyway, as you describe it) and then do everything in memory. Furthermore your algorithm is likely going to be more efficient iterating over sets without the membership check, like so:

# ...
for set in sets:
    for id in set:
        if id not in ids_count:
            ids_count[id] = 0
        ids_count[id] += 1

This is assuming your performance is network-bound. But you have to benchmark this to be sure.

Redis: best way to count occurrence of item in sets

Answers (1)

Related Questions