Mark J Miller
Mark J Miller

Reputation: 4871

GC pauses causing performance issue

I just started working on a project (I'm new not the project) that as a performance optimization loads 32GB of graph data (nodes, edges, etc) into memory and keeps it there. This is a long running service so the data is meant to remain in memory for the lifetime of the service. When a Gen 2 collection is triggered by the CLR there are large pauses (of course) which hurt performance while the GC scans Gen 2 marking everything as reachable objects.

What I'd like to know is are there strategies available for managed applications that must keep large amounts of data in memory? What are the best ways to prevent Gen 2 collections from running - ever?

Upvotes: 1

Views: 188

Answers (1)

russw_uk
russw_uk

Reputation: 1267

There are a few general things that you can do in your implementation to make it more GC friendly: a relatively easy one is to reduce the number of object references in your object graph. For example, replace:

class Graph {
    List<Node> roots;
    // ...
}

class Node {
    Node[] outwardEdges;
    // ...
}

With indirect references through Node identifiers:

class Graph {
   List<Node> roots;
   Node[] allNodes;
   // ...
}

class Node {
    int[] outwardEdges;
    // ...
}

or something similar that fits your design. This reduces the number of pointers in the object graph that the collector has to walk.

Shifting the data on to the native heap is another possibility, writing a small JNI library to give you the interface to perform the operations you need. This can pay off in other ways: the last time I had a similar problem to solve we made substantial space savings through this approach because we had largely western textual data in the data set, which occupied far less space encoded as UTF8. As long as the cost of your graph search is non-trivial then the overhead of the native call is not likely to be an issue.

Upvotes: 1

Related Questions