Damian
Damian

Reputation: 5561

App engine datastore: How to implement Posts and Tags without joins?

I'm building an application in Google App Engine (Java), where users can make posts and I'm thinking in adding tags to these posts, so I will have something like this:

in entity Post:

public List<Key> tags;

in entity Tag:

public List<Key> posts;

It would be easy to query, for example, all posts with a certain tag, but how could I get all the posts that has a list of tags? I could make a query for each tag and then make an intersection of the results, but maybe there is a better way... because that would be slow with a lot of posts.

Another thing that may be more difficult is having a post, get the posts that have tags in common ordered by the number of common tags, so I could get "similar" posts to this one, in some way.

Well, with joins this would be a lot easier, but I'm starting with app engine and can't really think about a good way to replace joins.

Thanks!

Upvotes: 10

Views: 2661

Answers (3)

onejigtwojig
onejigtwojig

Reputation: 4851

See @topchef's blog post on this: Efficient Keyword Search with Relation Index Entities and Objectify for Google Datastore. It talks about implementing search with list properties using Relation Index Entities and Objectify.

Upvotes: 1

Heidmo
Heidmo

Reputation:

You might want to check out this video from Google IO. Relation Index entities are what you need and allows you to remove List<Key> posts on the Tag entity. As well as List<Key> tags on the Post entity.

Upvotes: 1

Peter Recore
Peter Recore

Reputation: 14187

With this design, I'm afraid your Tag Entity could be a bottleneck, especially if you expect some tags to be very common. Three specific issues I can think of are efficiency of your gets and puts, write contention and exploding indexes. Let's look at stackoverflow for an example - there are 14,000 posts tagged "java" right now.

  1. That means every time you need to fetch your java tag entity you are pulling back 14k's worth of key data from the datastore. then you are sending it all back when you do a put. that could add up to a lot of bytes.
  2. In addition to the bytes going back and forth, each put will require indexes to be updated. each entry in the ListProperty maps to a separate index entry. so now you're doing lots of index updates. which leads us to number 3...
  3. Exploding Indexes. each entity has a limit on how many index entries it can have. I think the limit is 5000 per entity. so that is actually a hard limit on how many posts could ever have the same tag.

Further Reading:

The good news is, some of your requirements would be easily handled by just the Post entity. For example, you could easily find all the posts that have all of a list of tags with a query filter like this:

Query q = pm.newQuery(Post.class)
q.setFilter("tags" == 'Java' && "tags == 'appengine'");

For all posts with either java or appengine tags, you would need to do one query for each tag, then combine the results yourself. The datastore doesn't handle OR/IN type operations right now.

Finding related posts sounds tricky. I'll think about that after some coffee.

Upvotes: 5

Related Questions