ali
ali

Reputation: 531

App engine datastore denormalization: index properties in the main entity or the denormalized entity?

Consider the classic example of blog data modelling, where we have a Blog entity with many properties, and we want to list the latest blogs in a page.

It makes sense to denormalize the BlogPost entity into a BlogPostSummary entity which will be shown in the list view, avoiding fetching and deserializing many unwanted properties.

class BlogPost(db.Model):
  title = db.StringProperty()
  content = db.TextProperty()
  created = db.DateProperty()
  ...

class BlogPostSummary(db.Model):
  title = db.StringProperty()
  content_excerpt = db.TextProperty()

The question is: which entity should hold the indexed properties? There are 3 options:

1. Index properties in both

2. Index properties in main entity only

3. Index in denormalized entity only

Which option would work better? Are there other options?

Would the double round trip to datastore in option 2 be a problem?

Upvotes: 1

Views: 240

Answers (2)

Brent Washburne
Brent Washburne

Reputation: 13158

If you want to avoid fetching and deserializing unwanted properties, then put all your entities into a single model and use Projection queries:

class BlogPost(db.Model):
  title = db.StringProperty(indexed=True)
  content = db.TextProperty(indexed=False)
  content_excerpt = db.TextProperty(indexed=False)
  created = db.DateProperty(indexed=True)
  ...

posts = BlogPost.query().fetch(projection=["title", "created"])

(Note that the projection fields must be indexed.)

Upvotes: 0

Andrei Volgin
Andrei Volgin

Reputation: 41099

This is an abstract question that does not have a "correct" answer. The choice of a data model depends on specific requirements of a project, including:

  • usage patterns (how often you need access to different data)
  • update patterns (e.g. separating frequently updated properties from stable properties in order to reduce write costs)
  • average performance and extreme-case performance requirements (e.g an average blog may have 10 posts, a very popular blog may have 10,000 posts)
  • ability to use memcache to reduce datastore trips and improve performance
  • data complexity (i.e. how many different kids of entities depend on this particular entity kind)
  • transaction requirements
  • security and access roles considerations (e.g. not exposing private data by mistake)

By the way, there is another way to model data in the Datastore - using child entities. For example, blog posts may be child entities of a blog entity. This way you can retrieve all blog posts with a single query by providing a parent key - without storing post IDs or keys in the blog entity or blog ID/key in the post entities.

Upvotes: 2

Related Questions