sultan
sultan

Reputation: 6058

Django: collecting users, objects activity stream

I use django-activity-stream module to collect users activity. But when one user (user1) follows another (user2).

I need to fetch activity stream of the following user (user2) and combine all activities sorting by date and time (see code below).

And since activity list grows I think I'll face performance and optimisation issues. I believe someone already solved similar problems.

Any ideas and advices on how to make activity generation more efficient?

def build_activity(raw_activity):
    activity = []
    for item in raw_activity:
        action_object = get_action_object(item)
        activity.append({
            'user': User.objects.get(pk=int(item.actor_object_id)),
            'verb': item.verb,
            'action_object': action_object[1],
            'type': action_object[0],
            'timestamp': timesince(item.timestamp),
            'datetime': item.timestamp,
        })
    return activity


def activity_stream(user):
    from actstream.models import actor_stream
    raw_activity = actor_stream(user)
    activity = build_activity(raw_activity)
    for following in Follow.objects.filter(user=user):
        stream = actor_stream(following.target_user)
        activity += build_activity(stream)
    return sorted(activity, key=lambda item:item['datetime'], reverse=True)

Thanks,

Sultan

Upvotes: 3

Views: 1071

Answers (3)

Thierry
Thierry

Reputation: 3155

Over at Fashiolista we've opensourced our approach to building feed systems. https://github.com/tschellenbach/Feedly It's currently the largest open source library aimed at solving this problem. Think it also solves your problem of development time vs premature optimization. :)

To start out I would Redis as a datastorage. Later when your site gets larger it often makes sense to move to Cassandra.

The same team which built Feedly also offers a hosted API, which handles the complexity for you. Have a look at getstream.io At the moment we have client APIs for Python, Ruby, Node and PHP. In addition since its based on a heavily optimized Cassandra setup we can price it far below which a self hosted solution based on Redis would cost you.

In addition have a look at this high scalability post were we explain some of the design decisions involved: http://highscalability.com/blog/2013/10/28/design-decisions-for-scaling-your-high-traffic-feeds.html

This tutorial will help you setup a system like Pinterest's feed using Redis. It's quite easy to get started with.

To learn more about feed design I highly recommend reading some of the articles which we based Feedly on:

Upvotes: 3

Zhe Li
Zhe Li

Reputation: 1153

Premature optimization is the root of all evil.

But if I were going to optimize this, I might generate another stream, and the timestamps for the actions is set by the action_object timestamp... :)

Upvotes: 1

patrickn
patrickn

Reputation: 2561

Unless I have a verifiable performance issue, I personally dislike premature optimization as it often has become an endless spiral into insanity for me. You might find this to be the case here as well.

Upvotes: 2

Related Questions