bashoogzaad
bashoogzaad

Reputation: 4781

How to balance the load when building cache from App Engine?

I currently have the following situation, which has bothered me for a couple of months right now.

The case

I have build a Java (FX) application which serves as a cash registry for my shop. The application contains a lot of classes (such as Customer, Customer, Transaction etc.), which are shared with the server API. The server API is hosted on Google App Engine.

Because we also have an online shop, I have chosen to build the cache of the entire database on startup of the application. To do this I call the GET of my Data API for each class/table:

protected QueryBuilder performGet(HttpServletRequest req, HttpServletResponse res)
        throws ServletException, IOException, ApiException, JSONException {

    Connection conn = connectToCloudSQL();

    log.info("Parameters: "+Functions.parameterMapToString(req.getParameterMap()));

    String tableName = this.getTableName(req);
    log.info("TableName: "+tableName);
    GetQueryBuilder queryBuilder = DataManager.executeGet(conn, req.getParameterMap(), tableName, null);

    //Get the correct method to create the objects
    String camelTableName = Functions.snakeToCamelCase(tableName);
    String parsedTableName = Character.toUpperCase(camelTableName.charAt(0)) + camelTableName.substring(1);

    List<Object> objects = new ArrayList<>();
    try {

        log.info("Parsed Table Name: "+parsedTableName);
        Method creationMethod = ObjectManager.class.getDeclaredMethod("create"+parsedTableName, ResultSet.class, boolean.class);

        while (queryBuilder.getResultSet().next()) {

            //Create new objects with the ObjectManager
            objects.add(creationMethod.invoke(null, queryBuilder.getResultSet(), false));
        }
        log.info("List of objects created");

        creationMethod = null;
    }
    catch (Exception e) {
        camelTableName = null;
        parsedTableName = null;
        objects = null;
        throw new ApiException(e, "Something went wrong while iterating through ResultSet.", ErrorStatus.NOT_VALID);
    }

    Functions.listOfObjectsToJson(objects, res.getOutputStream());

    log.info("GET Request succeeded");

    //Clean up objects
    camelTableName = null;
    parsedTableName = null;
    objects = null;

    closeConnection(conn);

    return queryBuilder;

}

It simples gets every row from the requested table in my Cloud SQL database. Then it creates the objects, with the class that is shared with the client application. Lastly, it converts these classes to JSON using GSON. Some of my tables have 10.000+ rows, and then it takes approx. 5-10 sec to do this.

At the client, I convert this JSON back to a list of objects by using the same shared class. First I load the essential classes sequentially (because else the application won't start), and after that I load the rest of the classes in the background with separate threads.

The problem

Every time I load up the cache, there is a chance (1 on 4) that the server responds with a DeadlineExceededException on some of the bigger tables. I believe this has something to do with Google App Engine not being able to fire up a new instance in time, and therefore the computation time exceeds the limit.

I know it has something to do with loading the objects in background threads, because these all start at the same time. When I delay the start of these threads with 3 seconds, the error occurs a lot less, but is still present. Because the application loads 15 classes in the background, delaying them is not ideal because the application will only work partly until it is done. It is also not an option to load everything before starting, because this will take more than 2 minutes.

Does anyone know how to set up some load balancing on Google App Engine for this? I would like to solve this server side.

Upvotes: 1

Views: 96

Answers (1)

konqi
konqi

Reputation: 5227

You clearly have an issue with warm up requests and a query that takes quite long. You have the usual options:

  • Do some profiling and reduce the cost of your method invocations
  • use caching (memcache) to cache some of the result

If those options don't work for you, you should parallelize your computations. One thing that comes to my mind is that you could reliably reduce request times if you simply split your request into multiple parallel requests like so:

  • Let's say your table contains 5k rows.
  • Then you create 50 requests with each handleing 100 rows.
  • Aggregate the results on server or client side and respond

It'll be quite tough to do this on just the server side but it should be possible if your now (much) smaller taks return within a couple of seconds.

Alternatively you could return a job id at once and make the client poll for the result in a couple of seconds. This would however require a small change on the client side. It's the better option though imho, especially if you want to use a task queue for creating your response.

Upvotes: 1

Related Questions