minou
minou

Reputation: 16563

Creating your own activity logging in GAE/P

I'd like to log user activity in my app for presentation to users and also for administrative purposes. My customers are companies so there are three levels at which I may be presenting activity:

  1. Activity of a single user
  2. Activity of all users of a company
  3. All activity

To do the logging, I would create a model to store the log entries. I see a few ways of doing this.

First, I could store each logged activity in its own entity and then query as needed:

class Activity(ndb.Model):
    activity = ndb.StringProperty()
    user_id = ndb.StringProperty()
    company_id = ndb.StringProperty()

Second, I could store all activity of a user in a single entity:

class UserActivity(ndb.Model):
    activity = ndb.StringProperty(repeated=True) # Note this is now a list
    company_id = ndb.StringProperty()

Third, I could store all activity of a company in a single entity:

class CompanyActivity(ndb.Model):
    activity = ndb.StringProperty(repeated=True) # Would store user_id here somehow

What are the functionality/performance tradeoffs in the three options? I understand that there are potential contention issues with the second and third options if there are frequent put transactions, but let's assume that is not an issue for the sake of discussion.

For the second and third options, are there any significant advantages in reducing the total number of datastore entities (since they would be consolidated into fewer entities)? Or should I just go with the first option?

Upvotes: 5

Views: 134

Answers (2)

andruso
andruso

Reputation: 1995

I would also suggest the first approach but with KeyProperty:

class Activity(ndb.Model):
    activity = ndb.StringProperty()
    user_id = ndb.KeyProperty(kind='User')
    company_id = ndb.KeyProperty(kind='Company')

Code would be much cleaner from the start and you can always fine-tune later.

For the rest Dan has covered most important points really well.

Upvotes: 0

Dan Cornilescu
Dan Cornilescu

Reputation: 39834

The only advantage of using the repeated property would be that you'd avoid the eventual consistency problem: whenever you read a UserActivity or CompanyActivity entity you'll know that you get the complete list of all activities. When using the 1st approach you'd have to make a query to obtain such list and the list may miss very recent activities as the respective query index may not have yet been updated to reflect them.

But, in addition to the potential contention problem you mentioned, there is another disadvantage to consider for the repeated property approach: the size of these entities will gradually be increasing as more and more activities are being added to the list, which translates into:

  • progressively slower get()/put() times, so gradually deteriorating overall app performance
  • the risk of hitting the max datastore entity size (~ 1MB, see Limits), which would require additional logic for splitting the list across multiple entities

The 3rd approach in particular will also require a less trivial method of obtaining per-user activity reports.

I'd stick with the 1st approach, it's the most flexible and scalable approach and the disadvantages are minor:

  • the eventual consistency problem is IMHO not a show-stopper (and there could be ways of reducing its impact)
  • the extra storage space (for the user/company ID properties stored in each Activity entity plus larger indexes due to the higher number of entities) is IMHO well worthy (storage is cheap).

Upvotes: 3

Related Questions