YSC
YSC

Reputation: 399

GAE DataStore Retrieving Entities by Keys and Filter by Dates - Slow Performance

I use NDB query to retrieve entities by a list of keys by IN operator and then filter by Dates. The query was broken into several sub queries, which is written in doc. and running in sequence instead of in parallel.

class Post(ndb.Model):
  modified_date = ndb.DateTimeProperty()

# query
posts = Post.query(
  Post.key.IN(keys),
  Post.modified_date >= start_date,
  Post.modified_date <= end_date).fetch()

The query profiling graph shows sequentially run sub queries. It takes about 0.3 seconds for 25 keys. The query latency is linear to number of keys to get.

query profiling

Is there any way to optimize the query, and what is the best practice to retrieve entities by keys and filter by date range?

Upvotes: 1

Views: 372

Answers (1)

janscas
janscas

Reputation: 629

The problem is with the IN operator.

For each key in keys GAE will perform an individual query.

From the GAE docs:

The IN operator also performs multiple queries: one for each item in the specified list, with all other filters unchanged and the IN filter replaced with an equality (=) filter. The results are merged in order of the items in the list. If a query has more than one IN filter, it is performed as multiple queries, one for each possible combination of values in the IN lists.

A single query containing not-equal (!=) or IN operators is limited to no more than 30 subqueries.

https://cloud.google.com/appengine/docs/standard/python/datastore/queries

The modified_date is ok, because there is an index and therefore is efficient.

Upvotes: 2

Related Questions