Reputation: 5879
If I had an entity with an indexed property, say "name," what would the performance of == queries on that property be like?
Of course, I understand that no exact answers are possible, but how does the performance correlate with the total number of entities for which name == x
for some x, the total number of entities in the datastore, etc.?
How much slower would a query on name == x
be if I had 1000 entities with name equalling x, versus 100 entities? Has any sort of benchmarking been done on this?
Upvotes: 0
Views: 614
Reputation: 12742
AppEngine does queries in a very optimized way, so it is virtually irrelevant from a performance stand-point whether you do a query on the name property vs. just doing a batch-get with the keys only. Either will be linear in the number of entities returned. The total number of entities stored in your database does not make a difference. What does make a tiny difference, though, is the number of different values for "name" that occur in your database (so, 1000 entities returned will be pretty much exactly 10 times slower than 100 entities returned).
The way this is done is via the indices (or indexes as preferred) stored along with your data. An index for the "name" property consists of a table that has all names sorted in alphabetical order (and a second one sorted in reverse alphabetical order, if you use descending order in any of your queries) and a query will then simply find the first occurrence of the name you are querying in the table and start returning results in order. This is called a "scan".
This video is a bit technical, but it explains in detail how all this works and if you're concerned about coding for maximum performance, might be a good time investment:
Google I/O 2008: Under the Covers of the Google App Engine Datastore
(the video quality is fairly bad, but they also have the slides online (see link above video))
Upvotes: 1
Reputation: 15143
Some not very strenuous testing on my part indicated response times increased roughly linearly with the number of results returned. Note that even if you have 1000 entities, if you add a limit=100 to your query, it'll perform the same as if you only had 100 entities.
This is in line with the documentation which indicates that perf varies with the number of entities returned.
When I say not very strenuous, I mean that the response times were all over the place, and it was a very very rough estimate to draw a line through. I'd often see an order of magnitude difference in perf on the same request.
Upvotes: 2