Reputation: 6058
I use django-activity-stream module to collect users activity. But when one user (user1) follows another (user2).
I need to fetch activity stream of the following user (user2) and combine all activities sorting by date and time (see code below).
And since activity list grows I think I'll face performance and optimisation issues. I believe someone already solved similar problems.
Any ideas and advices on how to make activity generation more efficient?
def build_activity(raw_activity):
activity = []
for item in raw_activity:
action_object = get_action_object(item)
activity.append({
'user': User.objects.get(pk=int(item.actor_object_id)),
'verb': item.verb,
'action_object': action_object[1],
'type': action_object[0],
'timestamp': timesince(item.timestamp),
'datetime': item.timestamp,
})
return activity
def activity_stream(user):
from actstream.models import actor_stream
raw_activity = actor_stream(user)
activity = build_activity(raw_activity)
for following in Follow.objects.filter(user=user):
stream = actor_stream(following.target_user)
activity += build_activity(stream)
return sorted(activity, key=lambda item:item['datetime'], reverse=True)
Thanks,
Sultan
Upvotes: 3
Views: 1071
Reputation: 3155
Over at Fashiolista we've opensourced our approach to building feed systems. https://github.com/tschellenbach/Feedly It's currently the largest open source library aimed at solving this problem. Think it also solves your problem of development time vs premature optimization. :)
To start out I would Redis as a datastorage. Later when your site gets larger it often makes sense to move to Cassandra.
The same team which built Feedly also offers a hosted API, which handles the complexity for you. Have a look at getstream.io At the moment we have client APIs for Python, Ruby, Node and PHP. In addition since its based on a heavily optimized Cassandra setup we can price it far below which a self hosted solution based on Redis would cost you.
In addition have a look at this high scalability post were we explain some of the design decisions involved: http://highscalability.com/blog/2013/10/28/design-decisions-for-scaling-your-high-traffic-feeds.html
This tutorial will help you setup a system like Pinterest's feed using Redis. It's quite easy to get started with.
To learn more about feed design I highly recommend reading some of the articles which we based Feedly on:
Upvotes: 3
Reputation: 1153
Premature optimization is the root of all evil.
But if I were going to optimize this, I might generate another stream, and the timestamps for the actions is set by the action_object timestamp... :)
Upvotes: 1
Reputation: 2561
Unless I have a verifiable performance issue, I personally dislike premature optimization as it often has become an endless spiral into insanity for me. You might find this to be the case here as well.
Upvotes: 2