Xeon
Xeon

Reputation: 5989

java - memory usage

I'm developing an application which loads lots of data (like from csv).

I'm creating List<List<SimpleCell>> and loading into it the readed cells. SimpleCell class contains 5 * String, every String have on average 10 characters.

So I'm thinking that if I read 1000 rows - each containing 160 columns - that gives 1000*160=160 000 SimpleCell's instances - it'll be something about 160 000 * sizeof(SimpleCell.class) =~ 160 000 * 10 * 5 = 8 000 000 bytes =~ 7.63 MB.

But when I'm looking at jconsole (and after clicking Perform GC) memory usage is something about 790MB. How could this be?

Note that I don't store any references to any "temporary" objects. Here is the code when the memory usage rises:

        for(int i = r.getFromIndex(); i <= r.getToIndex(); ++i) {
            System.out.println("Processing: 'ZZ " + i + "'");
            List<SimpleCell> values = saxRead("ZT/ZZ " + i + "");
            rows.add(values);
        }

saxRead just creates inputStream parses it with SAX, closes stream, and returns cells (created by SAXHandler) - so there are only local variables (that I think will be garbaged in the near 'future').

I'm getting out of heap error when reading 1000 rows but I must read approximately 7k.

Obviously - there's something that I don't know about jvm memory. So why memory usage is so huge when loading this relatively small amount of data?

Upvotes: 2

Views: 391

Answers (4)

Peter Lawrey
Peter Lawrey

Reputation: 533510

A String uses 48 bytes plus the size of the text * 2. (Each character is 2 bytes) The Simple Cell object uses 40 bytes and the List of them uses 1064 bytes.

This means each row uses 1064 + 160 * 40 + 5 * 180 * (48 + 20) bytes or about 68K. If you have 1000 lines you will be using about 70 MB which is much less than what you are seeing.

I suggest you use a memory profile to see exactly how much memory is being used by what. e.g. VisualVM or YourKit.

Depending on how you construct the Strings you retain even more memory than this. For example its likely you are retaining a reference to the original XML as when you take a substring of it, you are actually holding a copy of the original.


You may find this class useful. It will reduce the amount of memory Strings use if they are using more than they need and reduce duplicates using a fixed size cache.

static class StringCache {
    final WeakReference<String>[] strings;
    final int mask;

    @SuppressWarnings("unchecked")
    StringCache(int size) {
        int size2 = 128;
        while (size2 < size)
            size2 *= 2;
        strings = new WeakReference[size2];
        mask = size2 - 1;
    }

    public String intern(String text) {
        if (text.length() == 0) return "";

        int hash = text.hashCode() & mask;
        WeakReference<String> wrs = strings[hash];
        if (wrs != null) {
            String ret = wrs.get();
            if (text.equals(ret))
                return ret;
        }
        String ret = new String(text);
        strings[hash] = new WeakReference<String>(ret);
        return ret;
    }
}

Upvotes: 3

gpeche
gpeche

Reputation: 22514

Java is very memory hungry. Consider these estimates:

32-bit VM:

Size of one of your String (approx)

10 UTF-16 chars = 20 bytes

1 array length = 4 bytes

1 array object header = 8 bytes

1 array reference = 4 bytes

1 offset, count, hashcode (internal fields) = 12 bytes

1 object header = 8 bytes

1 of your typical Java Strings = 20 + 4 + 8 + 4 + 12 + 8 = 56 bytes

Size of a Simple Cell (approx, including Strings)

5 Strings = 56 * 5 = 280 bytes

5 String references = 5 * 4 bytes = 20 bytes

1 object header = 8 bytes

1 SimpleCell = 180 + 20 + 8 = 308 bytes

160000 SimpleCell = 308 * 160000 = 49280000 bytes

64-bit VM (with no compressed oops)

Size of one of your String (approx)

10 UTF-16 chars = 20 bytes

1 array length = 4 bytes

1 array object header = 8 bytes

1 array reference = 8 bytes

1 offset, count, hashcode (internal fields) = 12 bytes

1 object header = 8 bytes

1 of your typical Java Strings = 20 + 4 + 8 + 8 + 12 + 8 = 60 bytes

Size of a Simple Cell (approx, including Strings)

5 Strings = 60 * 5 = 300 bytes

5 String references = 5 * 8 bytes = 40 bytes

1 object header = 8 bytes

1 SimpleCell = 300 + 40 + 8 = 308 bytes

160000 SimpleCell = 348 * 160000 = 55680000 bytes

Obviously very far of your 790 Mb (looks like a leak), but almost an order of magnitude more than what you estimated.

Upvotes: 1

ddyer
ddyer

Reputation: 1788

Use VisualVM to profile your heap usage, and be prepared to be surprised.

Upvotes: 2

fmgp
fmgp

Reputation: 1636

JVM memory management introduce a lot of overhead. For example, on 32bit vm, a String with 5 characters consume 58 bytes of memory (not only 5 !):

JVM overhead: 16b + bookkeeping fields: 12b + pointer to char[]: 4b + char[] jvm overhead: 16b + data:10b

Upvotes: 2

Related Questions