YoungDinosaur
YoungDinosaur

Reputation: 1560

Google App Engine Saves and Avoiding Fetching Stale Data

I've been using App Engine for a few years and one problem that seems to pop up again and again with HDR is the fact that when you update data on one screen, you will sometimes retrieve the old data on the read-only version of that screen if the read-only request is made within a second or two of the update. I know that this has been talked about here and here. I know what the problem is but I'm wondering what the proper solution might be? Here's an example of where I could have stale data.

Say you build a shopping list app.

Entity 1: ShoppingList
Entity 2: ShoppingListItem

The ShoppingList entity has a field on it that keeps a total of the amount items in the list. Entity number two is your ShoppingListItem. When you add or delete a ShoppingListItem, you need to update the total on the ShoppingList. You have two ways to update the total.

  1. Query database and count the number of items in the ShoppingListItem table.
  2. Auto-increment the cached total field +/- 1 without ever counting the database.

With both solutions, you end up with with stale data. #1 could be out of date because the original save of the add/delete may not have propagated yet. Therefore the query would miss the new record. #2 will have stale data when you either add or remove multiple items in quick succession. The total is updated for save 1 but on save 2, save 1 may still be propagating and the query for the ShoppingList total for save 2 will still reflect the original state before save 1.

So my question is, what is the proper way to fix this? To me it seems like you have two choices:

  1. Always work off of a memcached entity when using cached totals. Saves will not clear the memcache but instead inject the updated entity directly back into it. That keeps memcache up-to-date. This will of course mean that queries that do not hit memcache (like getting a list of all ShoppingList entities) may still return stale data. But at least when all of the saves are done, the cached total in the database will be correct.
  2. Deal with the fact that you are going to have stale data but then figure out a way to clean it up sometime in the future. Either schedule a TaskQueue to requery that ShoppingList and update the total in 10-30 seconds or have a scheduled a Cron Job that runs every 30 seconds and looks for all ShoppingList entities that have been updated.

I'm leaning towards solution #1. Thoughts?

Upvotes: 0

Views: 155

Answers (2)

YoungDinosaur
YoungDinosaur

Reputation: 1560

I think in most cases @david-w-smith's answer is correct. However, I mentioned in my comment to his answer, there is a big limitation to using Entity Groups. That limit is that the Entity Group can only handle one write per second. That's probably fine for applications with just a few users but would not be acceptable for say, a social network.

As was also suggested by the Google documentation (https://developers.google.com/appengine/docs/java/datastore/structuring_for_strong_consistency), I have implemented a rather extreme implementation of caching using memcache. All shoppingList query results are stored in the cache and when the list is either updated, added, or deleted I also update the status of the query results in the cache. I worry about the natural complexity of maintaining two copies of the data (database and memcache) but that's what UnitTests are for. Right?

Upvotes: 0

Dave W. Smith
Dave W. Smith

Reputation: 24966

Option 3 is to give https://developers.google.com/appengine/docs/java/datastore/structuring_for_strong_consistency and https://developers.google.com/appengine/docs/java/datastore/structuring_for_strong_consistency a careful read. What you're encountering is a classic manifestation of "Eventual Consistency". Pay particular attention to Entity Groups and ancestor queries.

Upvotes: 4

Related Questions