bountiful
bountiful

Reputation: 814

How to deal with memory efficiency of lots of small objects in Java

I have an application which takes data from a file and stores it for later use. Each line in the file corresponds to one object Foo, which contains n pairs of Bar objects which are made of a single character String, each with a distinct Name. So I store this data like so:

Foo extends HashMap<Name, Pair<Bar, Bar>>

where Pair<A, B> is my own class which just stores 2 values and provides some methods (equals, hashcode etc).

The problem I have encountered is that when I store n=114 (this just happens to be the number in my test data) Pair objects in my Foo it should have a retained size of not much more than 228 bytes, when in fact it is more like 25kbytes. This means when I have ~1000 Foo objects I need 25MB of memory rather than 228kB, which is not really acceptable. (Note: the keys for each Foo object are the same, fooOne.keySet().equals(fooTwo.keySet()))

I am using VisualVM to profile my application, and when I delve into an instance of Foo I see:

Field           Type             Retained
-               
this            Foo              24750
...             
v table         HashMap$Entry[]  24662
  v [0]         HashMap$Entry    200
    v value     Pair             156
      v first   Bar              60
        ...
        > code  String           36
      v second  Bar              60
        ...
        > code  String           36
    v key       Name             72
      ...
      > name    String           36
  > [1]         HashMap$Entry    200
  > [2]        <HashMap$Entry>   -
  ...
  > [233]       HashMap$Entry    600
  ...
  > [255]      <HashMap$Entry>   -

So as you can see all the useful information is being surrounded by lots of useless (to me) data. If I had fewer, larger objects with the same data in I can see my useful:useless ratio would be better, but I can't see how I can implement this in any other way. Is there some other way I can store my data, but still be as convenient and easy to use as this?

EDIT

My application will need to be scalable to upwards of 6000 Bar instances and maybe as many Foo instances.

Upvotes: 2

Views: 1561

Answers (5)

lsoliveira
lsoliveira

Reputation: 4640

Take a look here. You'll see that you need quite a lot more bytes than you think to store a class (string or other) in the JVM's heap.

36 bytes for a 1 character string sounds quite right, as you need to store a lot of metadata for the object that holds the character (be sure to account for UTF encoding) plus the string class overhead.

Upvotes: 0

Brian Agnew
Brian Agnew

Reputation: 272297

You say:

I have an application which takes data from a file and stores it for later use

and later (in a comment)

I've been asked to make it as memory efficient as possible

I suspect your most memory efficient solution is to store the file and parse it upon request, rather than parse and store in advance. But do you really want to do this and suffer the related performance costs ? I don't think your memory issues are particularly huge, but (as stated by others) I would investigate the flyweight pattern.

Upvotes: 0

Razvan
Razvan

Reputation: 10093

You can try to drop the Bar and Pair objects and store a pair of as simple String object ,e.g. "ab" (where "a","b" currently correspond to a Pair made of Bar("a") and Bar("b"))

Probably use the Flyweight patterns to share the common names of all Foo objects, since you have fooOne.keySet().equals(fooTwo.keySet())

Upvotes: 0

Virmundi
Virmundi

Reputation: 2631

I think a lot of your problem is just object oriented code in general, and Unicode conversion specifically.

In Java a character in a string requires two bytes to store. So at the very least you can expect to double your memory usage versus keeping a file on the drive.

Each object, each little string is going to require a word worth of information because of the pointer the JVM needs to point to your object. So each pair of data is a word for the key and a word for the value plus the actual size of each. Now these pointers get added to the hash, which uses a word to point to itself, and several words to point to the entryset. And so it goes. This is object oriented programming.

Now you could change your code to store the pair as a simple char[2]. This would cut down on your memory foot print. Then when you want to interact with it, you could wrap the array with a Pair object.

Upvotes: 0

Adam Arold
Adam Arold

Reputation: 30528

I'm not entirely sure that I get your question right but in this situation using Flyweights may do the trick.

Flyweight pattern

Upvotes: 3

Related Questions