Li Haoyi
Li Haoyi

Reputation: 15812

Java: Data Structure Memory Estimates

Does anyone have a list of rough rule-of-thumb estimators for the various data structures? e.g.

  • Arrays
  • Lists
  • HashMaps
  • LinkedLists

I remember seeing some of these estimates thrown around in various places, but I can't seem to find one right now.

I know it's actually incredibly complicated, especially for things like HashMaps, but I'm looking for something really rough, like:

Memory(HashMap) = fixedOverhead + variableOverhead * tableSize + A*numKeys + B*numValues + Memory(allKeys) + Memory(allValues)

of course it'll vary a lot depending on this and that, but even a rough within-a-factor-of-2 estimate would be immensely useful.

Upvotes: 5

Views: 6058

Answers (5)

user unknown
user unknown

Reputation: 36269

On Infoq, there is a presentation infoq-11-nov-jvmperformance.mp3 from a worker at twitter: Pdf-slides, Audio:mp3 and Video.

It deals a lot about collections and other details of size of objects in the JVM.

Upvotes: 0

user unknown
user unknown

Reputation: 36269

Here is a simple program, which just consumes RAM:

import java.util.*;
/**
    RamInit (c) GPLv3

    @author Stefan Wagner
    @date Do 22. Mär 08:40:40 CET 2012

*/
public class RamInit
{
    private java.lang.Object consumer; 

    public RamInit (char type, int size)
    {
        switch (type) 
        {
            case 'a': Integer [] ai = new Integer [size]; 
                for (int i = 0; i < size; ++i) 
                    ai[i] = i; 
                consumer = ai; 
                break;
            case 'l': List<Integer> li = new ArrayList<Integer> (); 
                for (int i = 0; i < size; ++i) 
                    li.add (i); 
                consumer = li;
                break;
            case 'h': HashMap <Integer, Integer> hm = new HashMap <Integer, Integer> (); 
                for (int i = 0; i < size; ++i) 
                    hm.put (i, size - i); 
                consumer = hm;
                break;
            case 'L': LinkedList <Integer> ll = new LinkedList <Integer> (); 
                for (int i = 0; i < size; ++i) 
                    ll.add (i);     
                consumer = ll;          
                break;
            default: System.err.println ("invalid: " + type);
        }
    }

    public static void main (String args[])
    {
        char type = 'a';
        int size = 1000000; // 1M
        if (args.length == 2)
        {
            type = args[0].charAt (0);
            size = Integer.parseInt (args[1]);
        }
        try {
            new RamInit (type, size);
        }
        catch (OutOfMemoryError oome)
        {
            System.exit (1);
        }
    }
}

And here is a very simple script to test it:

#!/bin/bash

iterProg () {
ram=$1
maxram=$2 
typ=$3
size=$4
# echo java -Xmx${ram}M RamInit $typ $((size*1000*1000)) 
echo -n "." 
java -Xmx${ram}M RamInit $typ $((size*1000*1000)) && echo -en "\n"$typ $size ${ram}M || { 
    if (($ram==$maxram))
    then
        # echo "fail" 
        return 
    else 
        iterProg $((ram+1)) $maxram $typ $size 
    fi
    }
}

# try from 16 MB to 256
for typ in {a,l,h,L}; do 
  for size in {1,2,4}; do 
    iterProg $((size*17+1)) 256 $typ $size 
  done
done

It is a primitive iterator and should be replaced by something more sophisticated - for example if you need 37MB to call RamInit with Collection a and 1M elements, you should start for 2M elements with more than that.

And you should choose steps in a binary search, for example if 20M is too less, check 128, then (20+128)/2, then the avg of that, depending on success or failure with the lower limit or the upper limit.

Since a HashMap stores 2 Ints per element, it could start with roughly the double size of List/Array/Vector. However - times flies like an arrow, and while writing, the result is finished:

bash iterRamFind.sh 
..
a 1 19M.....
a 2 39M...............
a 4 83M..
l 1 19M.......
l 2 41M.......................
l 4 91M..............................................
h 1 63M.............................................................................................
h 2 127M...........................................................................................................................................................................................
h 4 255M......................
L 1 39M.................................................
L 2 83M...............................................................................................
L 4 163

The value 17 explains itself from first experiments. As we can see, the size increases in nearly linear manner.

Modifying the code to check for the influence it you use Longs, is up to you - I guess you will end with a factor of 2.

Upvotes: 0

Louis Wasserman
Louis Wasserman

Reputation: 198471

This table is quite exhaustive, and deals precisely with the JDK implementation choices measured in bytes per entry/element. If you want to do it on your own machine -- if you're running on a different machine, perhaps -- this Google Code site will let you download its source. http://code.google.com/p/memory-measurer/wiki/ElementCostInDataStructures

Upvotes: 2

Drew
Drew

Reputation: 426

This is pretty rough but these estimates should be correct. These are for the bare-bones data structures without including length variables or any other extras that tend to be included in Java.

where dataType is the type of data being stored

Array: (length n)
    n*sizeOf(dataType)

LinkedList:
    n*(sizeOf(dataType)+sizeOf(pointer))+sizeOf(pointer[head pointer])

List: 
    Array-backed=SpaceEfficiency(Array)
    LinkedList-backed=SpaceEfficiency(LinkedList)

HashMap: with v values, k keys
    v*sizeOf(valueDataType)

Tree: k-way tree with n nodes
    n*(sizeOf(nodeDataType)+(k*sizeOf(pointer)))+sizeOf(pointer[head pointer])

Graph: e edges, v vertices
    AdjacencyList:
        at most: v*((v*sizeOf(vertexDataType))+(e*sizeOf(pointer))) fully connected graph
        at least: v*sizeOf(vertexDataType) disconnected graph
    AdjacencyMatrix:
        v^2*sizeOf(int)

Upvotes: 0

Related Questions