Pokuri
Pokuri

Reputation: 3082

Converting 4Mb of JSON to java Object in Jackson taking 1500ms

In my app one entity modeled like following:

class Node{
     private String parentNodeId;
     private Node parentNode;
     // other properties and their `getters` and setters
}

As I am using denormalized form for NoSql DB. Each node has it's parent node reference. This way in DB I have 540 records, which is around 4Mb of JSON data. For fetching those records from DB not taking much time(70ms). But deserializing those from JSON to Java Objects taking nearly 1500ms. Combing all for a request to complete it is taking 2000ms. The code to do the transformation is as follows

List<String> records = DB.get("some criteria");
List<Node> results = Lists.newArrayList();
for(String entity : records){
   results.add(convertJSONToObject(entity, Node.class));
}

private <T> T convertJSONToObject(String record, Class<T> entityClass){
      if(StringUtils.isBlank(json)){
         return null;
      }
      ObjectReader reader = MAPPER.reader(objectClass);
      return reader.readValue(json);
}

Is there any better way I can reduce the transformation time or this speed is acceptable for that much of data?

Upvotes: 1

Views: 617

Answers (2)

Philzen
Philzen

Reputation: 4657

Optimizing Jackson Performance

High-level API (low-hanging fruits)

If possible, try feeding Jackson with anything other than a String, b/c that's the least memory-efficient way.

For instance, using byte[] instead of String consumes 50% less memory and thus would take considerably less processing time, as Jackson author StaxMan (alias Tatu) explains here and here.

Other options – depending on the way your data is coming in – are to feed Jackson directly with an URL, File or InputStream as recommended in section 3. of the Jackson Performance Best Practices.

Low-level

As section 5. of the above linked Performance Best Practices suggests, using JsonParser or TokenBuffer to programmatically process JSON data should give a real edge.

For completeness: the latter can also be used to deep-clone an object efficiently b/c it avoids construction and traversal of the tree model.

Data Model: Java Object vs. Data Transfer Object

This question may be an example of an X-Y-Problem as it's possible this data structure simply doesn't scale – at least not when serialized as-is. But we'd need to know more about what the data model looks like to be sure. However, what immediately comes to mind is:

  1. Using parentNodeId and parentNode in the Java model seems redundant, as the latter should already provide you with access to the former via parentNode.id.
  2. The Node model as shown bears the possibility of cyclic dependencies – which could exist fine as a Java object model, but break when being serialized. Even if your application ensures they don't happen, your serialized representation may be inefficient, for instance if similar nodes (that have same nodeId and/or data, but only a different parentNode) appear in your tree.
    If that is the case my approach would be to build a custom NodeSerializer and NodeDeserializer which could handle the transformation to and from a more efficient data transfer format – which you can conveniently register using the @JsonSerialize and @JsonDeserialize annotations on your Node-class btw (making them being picked up automatically by any ObjectMapper instance, even within Spring Boot for instance). The proposed data format could then be a "hydrated" representation, providing of a list of unique nodes in one property vs. their connections represented in another.

But as said, real sound advice would require more information regarding the problem domain to understand what is being modelled here.

Upvotes: 1

Danikov
Danikov

Reputation: 725

Document parsers can be rather heavy-weight due to holding the Object model completely in memory and the complexity of those models (potentially lots of POJOs).

First off, it's worth profiling the deserialization process to make sure it isn't anything like being IO bound, spending a lot of time doing reflection, or some kind of threading contention going on. There may be issues in there that you can fix or optimize easily.

Secondly, a great deal of performance gain on modern systems can be achieved by multithreading. Maybe look into breaking up your JSON model into pieces and deserializing them in parallel, or seeing if Jackson has an option to do this for you.

If you are going to require these Objects on a regular basis and your data has some lifespan to them, you might want to consider caching these objects and having a mechanism to invalidate or update them at an appropriate time. You should also consider excluding fields that you aren't going to use.

Another thing to look at is whether you need the entire object deserialized right away. I believe Jackson does have the ability to provide random access so, while you wait the 1500ms for it to be deserialized, you could provide temporary access by that method and simply deserialize the required fields. Alternatively, embrace that approach entirely- why deserialize what you don't need?

To take that even further, if you are processing all the objects, you might want to consider a streaming parser instead. This would be more suited if this is part of a workflow and doesn't preclude forking off an object model in the process.

If you're unsure about Jackson's performance, it might be worth profiling alternatives to see if they do any better. Under some conditions, GSON has proven to be a lot faster than Jackson, there also exists JSONP and JSON.simple. Benchmarking for your use case will give you the best idea which of these will perform best for you.

Upvotes: 1

Related Questions