Zied Hamdi
Zied Hamdi

Reputation: 2660

How to sort an optimized keys only query

I'm using datastore native api to access to gae database (for well studied specific reasons). I wanted to optimize the code and use the memcache in my requests instead of directly grabbing the values, the issue, is that my query is sorted.

When I do a findProductsByFiltersQuery.setKeysOnly(); on my query, I receive this error:

The provided keys-only multi-query needs to perform some sorting in memory. As a result, this query can only be sorted by the key property as this is the only property that is available in memory.

The weired thing is that it starts happening from a certain complexity of the request, for example this request fails:

SELECT __key__ FROM Product WHERE dynaValue = _Rs:2 AND productState = PUBLISHED AND dynaValue = _RC:2 AND dynaValue = RF:1 AND dynaValue = ct:1030003 AND dynaValue = _RS:4 AND dynaValue = _px:2 AND itemType = NEWS ORDER BY modificationDate DESC

while this one passes :

SELECT __key__ FROM Product WHERE itemType = CI AND productState = PUBLISHED ORDER BY modificationDate DESC

Can someone explain me why this is happening and if ordering is not possible when getting the keys, for what is that feature? : since results are paginated, it is useless to get a bad set of keys from the first filtering request. So how is it thought???

Please also notice that when I do non keysOnly very long request I receive this message

Splitting the provided query requires that too many subqueries are merged in memory.

at com.google.appengine.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:129)
at com.google.appengine.api.datastore.QuerySplitHelper.splitQuery(QuerySplitHelper.java:99)
at com.google.appengine.api.datastore.QuerySplitHelper.splitQuery(QuerySplitHelper.java:71)

Can someone explain me how is it possible there is in memory treatment when values are indexed? or is it the devmode server only that does this error?

Upvotes: 1

Views: 475

Answers (1)

Patrick Costello
Patrick Costello

Reputation: 3626

In-memory queries are necessary when you use OR, IN, and != operators in Datastore. As described in this blog post, queries using these operators are split in the client into multiple Datastore queries. For example:

SELECT * FROM Foo WHERE A = 1 OR A = 2

gets split into two queries:

SELECT * FROM Foo WHERE A = 1
SELECT * FROM Foo WHERE A = 2

If you add ORDER BY B to your query, both sub-queries get this order:

SELECT * FROM Foo WHERE A = 1 ORDER BY B
SELECT * FROM Foo WHERE A = 2 ORDER BY B

While each of these Datastore queries returns results sorted by B, the union of the queries is not. On the client side, the SDK merges the results from the two ordered by B.

In order to do this, the Datastore queries must actually return the ordered property, otherwise the SDK won't know the correct way to merge these together.

If you are writing queries with a large number of filters, make sure to only use AND filters. This will allow all the operations to be performed only in the Datastore, in which case no in-memory sorting is necessary.

Upvotes: 1

Related Questions