Johnny
Johnny

Reputation: 7351

Getting the average size of a specific class from a running JVM

I'm trying to get the average size of all instances of a certain class in a running JVM. I can create a heap dump with jcmd or similar but that takes a few seconds and this is a production server so I'd rather have something faster. jcmd has an option to create a histogram of this format:

 num     #instances         #bytes  class name
----------------------------------------------
   1:         84907      559415720  [I
   2:       9572537      229740888  java.lang.String
   3:        803323      142900392  [C
   4:       3190710      102102720  java.util.Hashtable$Entry

which seems promising, but I think the byte size is of shallow rather than retained memory. Is there a way to create the same histogram with retained size? Or maybe another method of getting the average size of instances of a class quickly?

I'm aware of the GDB method but it's not suitable because the sequence of commands will take me a few seconds to type and this is approximately the time a heap dump will take me with jcmd.

Upvotes: 1

Views: 289

Answers (1)

Stephen C
Stephen C

Reputation: 719426

I do not think that a generic tool that will report a histogram of the "deep sizes" of objects is technically feasible.

Why not?

If the "deep size" of an object depends on the semantics of its class, we have the problem that a general purpose tool doesn't know where the object boundaries are. One could implement a tool with hard-coded knowledge of standard types like String, StringBuffer or HashMap. However:

  • That approach does not doesn't scale, and it doesn't deal with the application and 3rd party library classes.
  • With some types (e.g. collection types) the semantic boundary (i.e. what the user wants to find out) may be context dependent.

Alternatively, if the "deep size" of an object is the simply the sum of the sizes of all objects that are reachable from it, there are two problems:

  • There will be significant over-counting. For instance, most of memory usage for strings is in the character arrays. These arrays will be counted twice: once in the reachability graphs of the String objects, and a second time as character arrays. For complex objects, the over-counting is likely to render the measure useless.

  • Computing the averages for the histogram would be very expensive. A simple approach would perform a reachability graph walk for each and every object in the heap. I don't know if there is a faster alternative.

  • The alternative would be avoid double counting by not including the "internal" objects in the instance counts. Unfortunately, then you would have the equally misleading phenomenon that (for example) internal strings, collections, etc inside more complex objects disappear. That means that you can no longer get accurate estimates for their average sizes.

I don't think there is a counting scheme that is actually going to work.


Having said that, the source code for the (OpenJDK) tools you currently using is available for download. You could modify the tools to analyze the object sizes differently. But I don't imagine that it will "quick" in terms of dev effort or speed of the modified tools.

Upvotes: 1

Related Questions