Sebastian Klenk
Sebastian Klenk

Reputation: 41

How to create memory efficient data structures in Java

If I understood correctly, Java creates some overhead per class. If I want to create typical data structures such as linked lists, trees, tries, etc. the individual (list) items will be classes and therefore create a significant overhead as opposed to similar data structures in C. This becomes especially difficult for very large data sets. Is there some better way to implement those kinds of data structures in Java such that I wont have the overhead associated with storing classes in memory?

Here the memory consumption of Java objects is described. If I have millions of objects, the overhead by using objects might become too expensive. So I was wondering if there are better ways to approach such a situation.

Upvotes: 2

Views: 2063

Answers (4)

starikoff
starikoff

Reputation: 1650

You can implement these collections over chunks of bytes (obtained as new byte[...] or ByteBuffer.allocate[Direct](...) or unsafe.allocateMemory(...)). You can then manage this memory manually: pack/unpack your objects to and from byte chunks along with additional data (like indices of left and right values for a binary tree, index of next for a linked list etc.) This way you will not have to spend memory on object headers, extra references, alignment (although you might decide you do need to introduce your own alignment); can have your objects offheap; can have them mapped to a filesystem for persistence etc. However, it's not simple and incurs subtleties (e.g. you might start depending on malloc implementation and lose JVM heap optimizations; lose memory model guarantees; your objects might be split between cache lines; you will lose benefits of GC compaction etc.). I'm not saying any of these is a show-stopper, just that it's not all roses and you should understand what exactly you are gaining. If you have millions of objects, well, it's likely that the overhead is in 100s of megabytes. Make sure it's worth it to try to save them (compared to how much necessary data takes + compared to how large your heap is).

Upvotes: 1

Lie Ryan
Lie Ryan

Reputation: 64837

If you have datasets where java's object size overhead is a practical issue, I'd suggest consider using a database. You can start with an in-memory, embedded database, like sqlite, h2, or redis.

As your data gets larger, you'll need more complex management. Updating cross references, indexes, and the like manually to ensure that your data can be efficiently queried is a huge effort that a database can help with.

Using a proper database also allows you to grow the data size further when your data starts reaching hundreds of gigabytes level where it no longer fits on memory, and when you have to transition to actually start using the disk, or even when you reach multi terabytes level when you have to use multiple machine(s) to hold the data, without major rewrites.

A proper database can grow together with your application, a bunch of objects in memory can't.

Upvotes: 0

Kevin Anderson
Kevin Anderson

Reputation: 4592

A quick Google search on "c++ library jni" turned up this article, entitled Wrapping a C++ library with JNI – introduction that might prove interesting. I didn't read it, so I make no recommendation or guarantee as to the contents.

Upvotes: 0

user3811082
user3811082

Reputation: 261

You always can use the c++ native code inside Java (JNI) to increase the performance and the level of control (I don't think you really need this and I'm not sure that you can surpass standard java code).

Upvotes: 0

Related Questions