Pravesh Jain
Pravesh Jain

Reputation: 4288

Counting entities in particular range in Google Appengine Datastore

I am new to Google Datastore. I am building a simple app and I want to count the number of entities satisfying a particular criteria. The obvious way to do this is to first query that entity and then take the count on the result :

// Applying query on the "height" property of the entity "Person
Filter heightFilter = new FilterPredicate("height",
                  FilterOperator.GREATER_THAN_OR_EQUAL,
                  minHeight);

// Use class Query to assemble a query
Query q = new Query("Person").setFilter(heightFilter);

// Use PreparedQuery interface to retrieve results
PreparedQuery pq = datastore.prepare(q);

// And now count.

I was wondering if this is the best way to carry out this particular task. The querying mechanism would go through the entire database of this particular entity(Person in this case) and match it one by one. Is there any better way to do this if I just need the count and not the entire entity?

Upvotes: 1

Views: 275

Answers (2)

jirungaray
jirungaray

Reputation: 1674

The following snippet will return only the keys and not whole entities, key only queries are considered "small operations" and therefore are both faster and more important...free.

Query q = new Query("Person").setKeysOnly();

Upvotes: -1

Jaime Gomez
Jaime Gomez

Reputation: 7067

The database is not designed for this kind of queries, counting is hard to scale and so you're encouraged to come up with a solution for your particular needs.

In this case, you could have a model that keeps the count for you, and update it every time you add/remove a Person. Then to get the count you just fetch that one entity and read the count, this is perfect for reads, fast and cheap.

The problem now is on writes, since you want to do this transactionally (to keep an accurate count) there might come a time in your application where there are more than 1-5 updates per second, and transactions will need to retry and could become a bottleneck. One popular solution in this stage is to use Sharding counters, which distribute the counting process across multiple entities, and so the throughput scales up.

I would advise you to keep it simple and only move to more advanced techniques when you really need it, to keep complexity under control. And never count, the idea with this technology (non-sql) is to pay the cost upfront, on writing, so reads are as fast and efficient as possible.

Upvotes: 2

Related Questions