Avner Levy
Avner Levy

Reputation: 6741

Minimizing application data memory overhead in java processes

I need to store lots of data (Objects) in memory (for computations).
Since computations are done based on this data it is critical that all data will reside in the same JVM process memory.
Most data will be built from Strings, Integers and other sub-objects (Collections, HashSet, etc...).
Since Java's objects memory overhead is significant (Strings are UTF-16, each object has 8 bytes overhead) I'm looking for libraries which enable storing such data in memory with lower overhead.
I've read interesting articles about reducing memory:
* http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf * http://blog.griddynamics.com/2010/01/java-tricks-reducing-memory-consumption.html

I was just wondering if there is some library for such scenarios out there or I'll need to start from scratch.
To understand better my requirement imagine a server which process high volume of records and need to analyze them based on millions of other records which are stored in memory (for high processing rate).

Upvotes: 7

Views: 553

Answers (3)

JoG
JoG

Reputation: 6732

For the String bit you can store the byte[] you get from String.getBytes("UTF8"). If you require a String object again you can then create it again from the ByteArray. It will ofcourse incur some more CPU for creating the String objects over and over again, so it will be a tradeoff between size<->speed.

Upvotes: 2

Persimmonium
Persimmonium

Reputation: 15789

Regarding strings, also look into -XX:+UseCompressedStrings jvm option, but looks like is has been dropped from latest jvm updates, see this other question

Upvotes: 0

radai
radai

Reputation: 24192

for collection overhead have a look at trove - their memory overhead is lower than the built-in Collections classes (especially for maps and sets which, in the JDK are based on maps).
if you have large objects it might be worthwhile to save them "serialized" as some compact binary representation (not java serialization) and deserialize back to a full-blown object when needed)
you could also use a cache library that can page out to disk? take a look at infinispan or ehcache. also, some of those libraries (ehcache among them, if memory serves) provide "off-heap storage" as part of your jvm process - a chunk of memory not subject to GC managed by the (native) library. if you have an efficient binary representation you could store it there (wont lower your footpring but might make GC behave better)

Upvotes: 6

Related Questions