Reputation: 7760
Our Python application serves around 2 million API requests per day. We got a new requirement from our business to generate the report which should contain the count of unique request and response every day.
The simplest option is to use LPUSH and RPOP. But RPOP will return one value at a time which will affect the performance. Is there any way to do a bulk pop from Redis?
Other suggestions for the scenario would be highly appreciated.
Upvotes: 2
Views: 3730
Reputation: 787
The actual question was regarding Redis List, You can use lrange to get all values in a single call, below is solution;
import redis
r_server = redis.Redis("localhost")
r_server.rpush("requests", "Adam")
r_server.rpush("requests", "Bob")
r_server.rpush("requests", "Carol")
print r_server.lrange("requests", 0, -1)
print r_server.llen("requests")
print r_server.lindex("requests", 1)
Upvotes: 0
Reputation: 5981
Another approach would be to use the Hyperloglog data structure. It was especially designed for this kind of use case.
It allows counting unique items with a low error margin (0.81%) and with a very low memory usage.
Using HLL is really simple:
PFADD myHll "<request1>"
PFADD myHll "<request2>"
PFADD myHll "<request3>"
PFADD myHll "<request4>"
Then to get the count:
PFCOUNT myHll
Upvotes: 0
Reputation: 5981
A simple solution would be to use redis pipelining
In a single request you will be allowed to perform multiple RPOP instructions. Most of redis drivers support it. In python with Redis-py it looks like this:
pipe = r.pipeline()
# The following RPOP commands are buffered
pipe.rpop('requests')
pipe.rpop('requests')
pipe.rpop('requests')
pipe.rpop('requests')
# the EXECUTE call sends all buffered commands to the server, returning
# a list of responses, one for each command.
pipe.execute()
Upvotes: 3
Reputation: 2394
Can approach this from a different angle. Your requirement is:
requirement ... to generate the report which should contain the count of unique request and response every day.
Rather than storing requests in the lists and then post-processing the results, why not use Redis features to solve the actual requirements and avoid the problem of bulk LPUSH/LPOP.
If all we want if to record the unique counts, then you may want to consider using sorted sets.
This may go like this:
# Collect the request statistics in the sorted set.
# The key includes date so we can do the "by date" stats.
key = 'requests:date'
r.zincrby(key, request, 1)
Python code:
# ZSCAN: Iterate over all members in the set in batches of about 10.
# This will be unordered list.
# zscan_iter returns tuples (member, score)
batchSize = 10
for memberTuple in r.zscan_iter(key, match = None, count = batchSize):
member = memberTuple[0]
score = memberTuple[1]
print str(member) + ' --> ' + str(score)
# ZRANGE: Get all members in the set, ordered by score.
# Here there maxRank=-1 means "no max".
minRank = 0
maxRank = -1
for memberTuple in r.zrange(key, minRank, maxRank, desc = False, withscores = True):
member = memberTuple[0]
score = memberTuple[1]
print str(member) + ' --> ' + str(score)
Solves the actual requirement - reports on the count of unique requests by day.
No need to post-process anything.
Can do additional queries like "top requests" out of the box :)
Upvotes: 1