imrichardcole
imrichardcole

Reputation: 4685

File size vs. in memory size in Java

If I take an XML file that is around 2kB on disk and load the contents as a String into memory in Java and then measure the object size it's around 33kB.

Why the huge increase in size?
If I do the same thing in C++ the resulting string object in memory is much closer to the 2kB.

To measure the memory in Java I'm using Instrumentation. For C++, I take the length of the serialized object (e.g string).

Upvotes: 14

Views: 2903

Answers (6)

Tim Autin
Tim Autin

Reputation: 6165

As stated in other answers, Java's String is adding an overhead. If you need to store a large number of strings in memory, I suggest you to store them as byte[] instead. Doing so the size in memory should be the same than the size on disk.

String -> byte[] :

String a = "hello";
byte[] aBytes = a.getBytes();

byte[] -> String :

String b = new String(aBytes);

Upvotes: 0

Val
Val

Reputation: 11107

Yes, you should GC and give it time to finish. Just System.gc(); and print totalMem() in the loop. You also better to create a million of string copies in array (measure empty array size and, then, filled with strings), to be sure that you measure the size of strings and not other service objects, which may present in your program. String alone cannot take 32 kb. But hierarcy of XML objects can.

Said that, I cannot resist the irony that nobody cares about memory (and cache hits) in the world of Java. We are know that JIT is improving and it can outperform the native C++ code in some cases. So, there is not need to bother about memory optimization. Preliminary optimization is a root of all evils.

Upvotes: 0

bengro
bengro

Reputation: 1014

String: a String's memory growth tracks its internal char array's growth. However, the String class adds another 24 bytes of overhead. For a nonempty String of size 10 characters or less, the added overhead cost relative to useful payload (2 bytes for each char plus 4 bytes for the length), ranges from 100 to 400 percent.

More: What is the memory consumption of an object in Java?

Upvotes: 1

Michael Borgwardt
Michael Borgwardt

Reputation: 346317

Assuming that your XML file contains mainly ASCII characters and uses an encoding that represents them as single bytes, then you can espect the in memory size to be at least double, since Java uses UTF-16 internally (I've heard of some JVMs that try to optimize this, thouhg). Added to that will be overhead for 2 objects (the String instance and an internal char array) with some fields, IIRC about 40 bytes overall.

So your "object size" of 33kb is definitely not correct, unless you're using a weird JVM. There must be some problem with the method you use to measure it.

Upvotes: 4

Marius
Marius

Reputation: 2273

I think there are multiple factors involved. First of all, as Bruce Martin said, objects in java have an overhead of 16 bytes per object, c++ does not. Second, Strings in Java might be 2 Bytes per character instead of 1. Third, it could be that Java reserves more Memory for its Strings than the C++ std::string does.

Please note that these are just ideas where the big difference might come from.

Upvotes: 4

Chechulin
Chechulin

Reputation: 2496

In Java String object have some extra data, that increases it's size.
It is object data, array data and some other variables. This can be array reference, offset, length etc.

Visit http://www.javamex.com/tutorials/memory/string_memory_usage.shtml for details.

Upvotes: 2

Related Questions