JohnnyBack
JohnnyBack

Reputation: 145

Performing large queries on Google App Engine Datastore

I'm working on a server built with Spring. We have a very simple query (get all entities of a single kind) that fails to run after a certain threshold of records (~1,000,000).

This is the first method that I've tried (modified from the Google documentation):

PersistenceManager pm = persistenceManagerFactory.getPersistenceManager();
List<T> Qresults;
List<T> detachedList =  new LinkedList<T>();
try {
    Query query = pm.newQuery(cla);
    query.setRange(0, 200);
    List<T> results = (List<T>) query.execute();
    Cursor cursor = JDOCursorHelper.getCursor(results);
    String cursorString = cursor.toWebSafeString();
    do {
        cursor = Cursor.fromWebSafeString(cursorString);
        Map<String, Object> extensionMap = new HashMap<String, Object>();
        extensionMap.put(JDOCursorHelper.CURSOR_EXTENSION, cursor);
        query.setExtensions(extensionMap);
        query.setRange(0, 200);
        Qresults = (List<T>) query.execute();
        results.addAll(Qresults);
    } while(Qresults.size() == 200);

    for (T item : results) {
        detachedList.add(pm.detachCopy(item));
    }
    return detachedList;
} finally {
    pm.close();
}

After that I've tried avoiding the PersistenceManager (I don't have any experience working with Spring or persistence):

JSONArray array = new JSONArray();
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
Query q = new Query("MobileDevice");
FetchOptions options = FetchOptions.Builder.withChunkSize(1000);
PreparedQuery pq = datastore.prepare(q);

for (Entity result : pq.asIterable(options)) {
    array.put(new MobileDevice(result).toJSON());
}
return new transactionResult(array.toString());

The server threw a timeout exception on both cases. Am I doing something wrong? Feels like this should be a simple operation.

Upvotes: 1

Views: 167

Answers (1)

Andrei Volgin
Andrei Volgin

Reputation: 41089

Timeout is related to the time of the request execution, not to the number of entities. If you run this query on auto-scaled App Engine instances, you have 60 seconds to complete a request. There is also a limit on how long a datastore query remains available - you need to use cursors if the number of results is large.

The obvious solution is not to return a few million records as a single JSON object, but let the client (or whoever needs this data) get them in manageable chunks that take less than 60 seconds to retrieve.

Upvotes: 2

Related Questions