Siva Arunachalam
Siva Arunachalam

Reputation: 7760

Redis as a Queue - Bulk Retrieval

Our Python application serves around 2 million API requests per day. We got a new requirement from our business to generate the report which should contain the count of unique request and response every day.

The simplest option is to use LPUSH and RPOP. But RPOP will return one value at a time which will affect the performance. Is there any way to do a bulk pop from Redis?

Other suggestions for the scenario would be highly appreciated.

Upvotes: 2

Views: 3730

Answers (4)

Fayyaz Ali
Fayyaz Ali

Reputation: 787

The actual question was regarding Redis List, You can use lrange to get all values in a single call, below is solution;

import redis
r_server = redis.Redis("localhost")
r_server.rpush("requests", "Adam")
r_server.rpush("requests", "Bob")
r_server.rpush("requests", "Carol")

print r_server.lrange("requests", 0, -1)
print r_server.llen("requests")
print r_server.lindex("requests", 1)

Upvotes: 0

Pascal Le Merrer
Pascal Le Merrer

Reputation: 5981

Another approach would be to use the Hyperloglog data structure. It was especially designed for this kind of use case.

It allows counting unique items with a low error margin (0.81%) and with a very low memory usage.

Using HLL is really simple:

PFADD myHll "<request1>"
PFADD myHll "<request2>"
PFADD myHll "<request3>"
PFADD myHll "<request4>"

Then to get the count:

PFCOUNT myHll

Upvotes: 0

Pascal Le Merrer
Pascal Le Merrer

Reputation: 5981

A simple solution would be to use redis pipelining

In a single request you will be allowed to perform multiple RPOP instructions. Most of redis drivers support it. In python with Redis-py it looks like this:

pipe = r.pipeline()
# The following RPOP commands are buffered
pipe.rpop('requests')
pipe.rpop('requests')
pipe.rpop('requests')
pipe.rpop('requests')
# the EXECUTE call sends all buffered commands to the server, returning
# a list of responses, one for each command.
pipe.execute()

Upvotes: 3

Philip P.
Philip P.

Reputation: 2394

Can approach this from a different angle. Your requirement is:

requirement ... to generate the report which should contain the count of unique request and response every day.

Rather than storing requests in the lists and then post-processing the results, why not use Redis features to solve the actual requirements and avoid the problem of bulk LPUSH/LPOP.

If all we want if to record the unique counts, then you may want to consider using sorted sets.

This may go like this:

Collect the request statistics

# Collect the request statistics in the sorted set.
# The key includes date so we can do the "by date" stats.
key = 'requests:date'
r.zincrby(key, request, 1)

Report request statistics

  • Can use ZSCAN to iterate over all members in batches, but this is unordered.
  • Can use ZRANGE to get all members in one go (or whatever), ordered.

Python code:

# ZSCAN: Iterate over all members in the set in batches of about 10.
# This will be unordered list.
# zscan_iter returns tuples (member, score)
batchSize = 10
for memberTuple in r.zscan_iter(key, match = None, count = batchSize):
    member = memberTuple[0]
    score = memberTuple[1]
    print str(member) + ' --> ' + str(score)


# ZRANGE: Get all members in the set, ordered by score.
# Here there maxRank=-1 means "no max".
minRank = 0
maxRank = -1
for memberTuple in r.zrange(key, minRank, maxRank, desc = False, withscores = True):
    member = memberTuple[0]
    score = memberTuple[1]
    print str(member) + ' --> ' + str(score)

Benefits of this approach

  1. Solves the actual requirement - reports on the count of unique requests by day.

  2. No need to post-process anything.

  3. Can do additional queries like "top requests" out of the box :)

Upvotes: 1

Related Questions