Reputation: 6252
I have an idea to store view counts for posts on my site, but am having trouble which methods to use and how to design it. My users post many small posts so there will be a lot of unique posts
My requirements are the following:
The posts are stored in a Mysql cloud database. I will need to periodically pull data from Redis to update Mysql with the views.
I am not concerned with EXACT accuracy, the views show can be a little less (~10% less than actual is tolerable).
In regards to the accuracy the views do not have to be in realtime. For newer posts I would like them to be updated every 30 minutes, for older posts (older than a week) I can update the views once daily.
Ideally every view would be unique(based on user) and so each view would count as that user has viewed the post once. A user could not view a post more than once. This would be ideal, but I would have to store every single view every user makes for the lifetime of the application and always check if the user has already viewed this post and this is just not feasible. So what I would like are the views need to be unique to a user for a limited time. So if a new post comes out and Bob views it this will count as 1 view. If he views it again in 10 minutes the count will still be as it was previously 1. If Bob views it again in 1 hour this would count as a new view. If the post was older than a week then Bob's views should only count if he views more than once a day.
The flow pseudocode:
//Page view comes to server, lets add the unique postId.
//Using SADD because it will not insert a key that already exists. Not sure of the difference between sadd and pfadd?
sadd "post:tracking", @post.id
//Add the unique user who viewed this page
sadd "post:#{@post.id}:uniques", @userId
I then run a cronjob which will do the following:
//Run this every 30 minutes
loop through smembers(post:tracking).For each post do{
//Get number of views for this post:
var cnt = SCARD("post:@postId:uniques")
//post to database new count
}
Every 30 minutes it will loop through all the sets (posts) and get the cardinality(count) of the members in that set and update the Mysql database. There is one problem here and that is I am not distinguishing between new posts (not older than 1 week) and older posts (older than 1 week). In my example the time the posts were created is not included because I am not sure where to store this information and this is what I am looking for help with. I am looking at making this more efficient.
Upvotes: 1
Views: 541
Reputation: 1660
Have you looked at hyperloglogs?. They store unique things (viewers) very efficiently, with pretty good accuracy (around 1%). You could leave them running and pull stats from them every hour/day/week to get overall viewer figures.
Upvotes: 1