Cristiano Ghersi
Cristiano Ghersi

Reputation: 2122

JDO on GoogleAppEngine: How to count and group with BigTable

I need to collect some statistics on my entities in the datastore.

As an example, I need to know how many objects of a kind I have, how many objects with some properties setted to particular values, etc. In usual relational DBMS I may use

    SELECT COUNT(*) ... WHERE property=<some value>

or

    SELECT MAX(*), ... GROUP BY property

etc. But here I cannot see any of these structures.

Moreover, I cannot take load all the objects in memory (e.g. using pm.getExtent(MyCall.class, false)) as I have too much entities (more than 100k).

Do you know any trick to achieve my goal?

Upvotes: 4

Views: 1726

Answers (2)

Kal
Kal

Reputation: 24910

Support for aggregate functions is limited on GAE. This is primarily an artifact of the schema-less nature of BigTable. The alternative is to maintain the aggregate functions as separate fields yourself to access them quickly.

To do a count, you could do something like this --

Query q = em.createQuery("SELECT count(p) FROM your.package.Class p");
Integer i = (Integer) q.getSingleResult(); 

but this will probably return you just 1000 rows since GAE limits the number of rows fetched to 1000.

Some helpful reading how to work around these issues --

http://marceloverdijk.blogspot.com/2009/06/google-app-engine-datastore-doubts.html

Is there a way to do aggregate functions on Google App Engine?

Upvotes: 1

Igor Artamonov
Igor Artamonov

Reputation: 35961

Actually it depends on your specific requirements.

Btw, there is a common way, to prepare this stats data in background.

For example, you can run few tasks, by using Queue service, that will use query like select x where x.property == some value + cursor + an sum variable. If you at the first step, cursor will be empty and sum will be zero. Then, you'll iterate your query result, for 1000 items (query limit) or 9 minutes (task limit), incrementing sum on every step, and then, if it's not finished, call this task with new cursor and sum values. I mean you add request to next step into queue. Cursor is easily serializable into string.

When you have final step - you have to save result value somewhere into stat results table.

Take a look at:

And also, this stats/aggregation stuff is really depends on your actual task/requirements/project, there few way to accomplish this, optimal for different tasks. There is no standard way, like in SQL

Upvotes: 2

Related Questions