Reputation: 43
I've got a design question regarding Google's database Cloud Datastore. Let me explain it by using an example:
I've got Entities of the kind "Article" with the following properties:
SumOfScore should be the sum of all related "Score" entities, which have properties like:
In Pseudo-SQL:
sumOfScore = select sum(score) from Score where score.articleId = article.id
I see two possibilities to design this (using Google' datastore API):
1.) No property sumOfScore for Articles; but query always:
This means: Every time an article is read, I need to do an query for this specific article for calculating the sumOfScore. Imagine a list of 100 Articles that is shown to a user. This would need additional 100 queries to the database, just to show the score for each article.
Nevertheless: This would be my preferred way when using a Relational-DB. No redundancy and good normalization. And with SQL you can use just one join-select to catch all data. But it doesn't feel right for Cloud Datastore.
2.) Calculate the sumOfScore whenever Score entities are changed:
This means: Whenever a Score-Entity is added, removed or changed, the related Article updates the sumOfScore property.
Advantage: When reading articles no additional queries are needed. The sumOfScore is redundant on the entity itself.
Disadvantage: Every time a score is changed, there is one additional query and an additional write (updating an Article entity). And sumOfScore may mismatch with the actual Score entities (e.g. value is changed via DB-Console)
What are more experienced people think? Is there a common best practice for such scenario? What are doing the JPA or JDO implementation under the hood?
Thanks a lot
Mos
Upvotes: 4
Views: 498
Reputation: 2445
Theres third possibility which doesn't make a compromise.
You make Score a child of Article, and keep the sumOfScore in Article. For sorting purposes, this field will come in handy. As this two classes are from the same entity group, you can create a Score and update the Article in a transaction. You could even double check by querying all the Score who's parent is a given Article.
The problem with this approach, is that you can only update an entity 5 times per second. If you think you'll have much more activity than that (remember, it's just a limitation on a single entity not the entier table), you should check out sharded counter tutorial or see the google io's video explaining this..
Heres a great discussion about this same topic: How does Google Moderator avoid contention?
Upvotes: 1
Reputation: 6179
The first thing I recommend you look into the GAE article about sharding counters.
That is an article from the GAE best practices relating to how you should be handling counters/sums. It can be a little tricky because every time you update an element you have to use logic to randomly pick a sharded counter; and when you retrieve your count you're actually fetching a group of entities and summing them. I've gone this route but won't provide code here on how I did it because I haven't battle tested it yet. But your code can get sloppy in a hurry if you just copy/paste the sample sharding code all over the place, so make an abstract or typed counter class to reuse your sharding logic if you decide to go this route.
Another alternative would be to use a fuzzy count. This method uses memcache and offers better performance at the cost of accuracy.
See the section here labeled "Transient and frequently updated data"
And the last alternative; is to just use SQL. Its experimental and hot out of the oven (in relation to being used on GAE) but it might be worth looking into.
Upvotes: 2