user1234813
user1234813

Reputation:

App Engine Datastore: entity design and query optimization

I have a system where users can vote on entities, if they like or hate them. It will be bazillion votes and trazillion records, hopefully, some time in the future :)

At the moment i store a vote in an Entity like this:

UserRecordVote: recordId, userId, hateOrLike

And when i want to get every Record the user liked i do a query like this:

I query the "UserRecordVote" table for all the "likes", then i take the recordIds from that resultset, create a key of that property and get the record from the Record Table.

Then i aggregate all that in a list and return it.

Here's the question:

I came up with a different approach and i want to find out if that one is 1. faster and 2. how much is the difference in cost.

I would create an Entity which's name would be userId + "likes" and the key would be the record id:

new Entity(userId + "likes", recordId)

So when i would do a query to get all the likes i could simply query for all, no filters needed. AND i could just grab the entity key! which would be much cheaper if i remember the documentation of app engine right. (can't find the pricing page anymore). Then i could take the Iterable of keys and do a single get(Iterable keys). Ok so i guess this approach is faster and cheaper right? But what if i want to grab all the votes of a user or better said, i want to grab all the records a user didn't vote on yet.

Here's the real question:

I wan't to load all the records a user didn't vote on yet: So i would have entities like this:

new Entity(userId+"likes", recordId);

and

new Entity(userId+"hates", recordId);

I would query both vote tables for all entity keys and query the record table for all entity keys. Then i would remove all the record entity keys matching one of the vote entity keys and with the result i would get(Iterable keys) the full entities and have all the record entites which are not in one of the two voting tables.

Is that a useful approach? Is that the fastest and cost efficient way to do a datastore query? Am i totally wrong and i should store the information as list properties?

EDIT: With that approach i would have 2 entity groups for each user, which would result in million different entity groups, how would GAE Datastore handle that? Atleast the Datastore Viewer entity select box would probably crash :) ?

Upvotes: 1

Views: 785

Answers (2)

stevep
stevep

Reputation: 959

Consider serializing user hate/like votes in two separate TextProperties inside the entity. Use the userId as key_name.

rec = UserRecordVote.get_by_key_name(userId)
hates = len(rec.hates.split('_'))
etc.

Upvotes: 0

dragonx
dragonx

Reputation: 15143

To answer the Real Question, you probably want to have your hateOrLike field store an integer that indicates either hated/liked/notvoted. Then you can filter on hateOrLike=notVoted.

The other solutions you propose with the dynamically named entities make it impossible to query on other aspects of your entities, since you don't know their names.

The other thing is you expect this to be huge, you likely want to keep a running counter of your votes rather than tabulating every time you pull up a UserRecord - querying all the votes, and then calculating them on each view is very slow - especially since App Engine will only return 1000 results on each query, and if you have more than 1000 votes, you'll have to keep making repeated queries to get all the results.

If you think people will vote quickly, you should look into using a sharded counter for performance. There's examples of that with code available if you do a google search.

Upvotes: 1

Related Questions