Reputation: 15812
Does anyone have a list of rough rule-of-thumb estimators for the various data structures? e.g.
- Arrays
- Lists
- HashMaps
- LinkedLists
I remember seeing some of these estimates thrown around in various places, but I can't seem to find one right now.
I know it's actually incredibly complicated, especially for things like HashMaps, but I'm looking for something really rough, like:
Memory(HashMap) = fixedOverhead + variableOverhead * tableSize + A*numKeys + B*numValues + Memory(allKeys) + Memory(allValues)
of course it'll vary a lot depending on this and that, but even a rough within-a-factor-of-2 estimate would be immensely useful.
Upvotes: 5
Views: 6058
Reputation: 36269
On Infoq, there is a presentation infoq-11-nov-jvmperformance.mp3 from a worker at twitter: Pdf-slides, Audio:mp3 and Video.
It deals a lot about collections and other details of size of objects in the JVM.
Upvotes: 0
Reputation: 36269
Here is a simple program, which just consumes RAM:
import java.util.*;
/**
RamInit (c) GPLv3
@author Stefan Wagner
@date Do 22. Mär 08:40:40 CET 2012
*/
public class RamInit
{
private java.lang.Object consumer;
public RamInit (char type, int size)
{
switch (type)
{
case 'a': Integer [] ai = new Integer [size];
for (int i = 0; i < size; ++i)
ai[i] = i;
consumer = ai;
break;
case 'l': List<Integer> li = new ArrayList<Integer> ();
for (int i = 0; i < size; ++i)
li.add (i);
consumer = li;
break;
case 'h': HashMap <Integer, Integer> hm = new HashMap <Integer, Integer> ();
for (int i = 0; i < size; ++i)
hm.put (i, size - i);
consumer = hm;
break;
case 'L': LinkedList <Integer> ll = new LinkedList <Integer> ();
for (int i = 0; i < size; ++i)
ll.add (i);
consumer = ll;
break;
default: System.err.println ("invalid: " + type);
}
}
public static void main (String args[])
{
char type = 'a';
int size = 1000000; // 1M
if (args.length == 2)
{
type = args[0].charAt (0);
size = Integer.parseInt (args[1]);
}
try {
new RamInit (type, size);
}
catch (OutOfMemoryError oome)
{
System.exit (1);
}
}
}
And here is a very simple script to test it:
#!/bin/bash
iterProg () {
ram=$1
maxram=$2
typ=$3
size=$4
# echo java -Xmx${ram}M RamInit $typ $((size*1000*1000))
echo -n "."
java -Xmx${ram}M RamInit $typ $((size*1000*1000)) && echo -en "\n"$typ $size ${ram}M || {
if (($ram==$maxram))
then
# echo "fail"
return
else
iterProg $((ram+1)) $maxram $typ $size
fi
}
}
# try from 16 MB to 256
for typ in {a,l,h,L}; do
for size in {1,2,4}; do
iterProg $((size*17+1)) 256 $typ $size
done
done
It is a primitive iterator and should be replaced by something more sophisticated - for example if you need 37MB to call RamInit with Collection a and 1M elements, you should start for 2M elements with more than that.
And you should choose steps in a binary search, for example if 20M is too less, check 128, then (20+128)/2, then the avg of that, depending on success or failure with the lower limit or the upper limit.
Since a HashMap stores 2 Ints per element, it could start with roughly the double size of List/Array/Vector. However - times flies like an arrow, and while writing, the result is finished:
bash iterRamFind.sh
..
a 1 19M.....
a 2 39M...............
a 4 83M..
l 1 19M.......
l 2 41M.......................
l 4 91M..............................................
h 1 63M.............................................................................................
h 2 127M...........................................................................................................................................................................................
h 4 255M......................
L 1 39M.................................................
L 2 83M...............................................................................................
L 4 163
The value 17 explains itself from first experiments. As we can see, the size increases in nearly linear manner.
Modifying the code to check for the influence it you use Longs, is up to you - I guess you will end with a factor of 2.
Upvotes: 0
Reputation: 198471
This table is quite exhaustive, and deals precisely with the JDK implementation choices measured in bytes per entry/element. If you want to do it on your own machine -- if you're running on a different machine, perhaps -- this Google Code site will let you download its source. http://code.google.com/p/memory-measurer/wiki/ElementCostInDataStructures
Upvotes: 2
Reputation: 4119
Check this out. From Java code to Java heap-Understanding and optimizing your application's memory usage
Upvotes: 3
Reputation: 426
This is pretty rough but these estimates should be correct. These are for the bare-bones data structures without including length variables or any other extras that tend to be included in Java.
where dataType is the type of data being stored
Array: (length n)
n*sizeOf(dataType)
LinkedList:
n*(sizeOf(dataType)+sizeOf(pointer))+sizeOf(pointer[head pointer])
List:
Array-backed=SpaceEfficiency(Array)
LinkedList-backed=SpaceEfficiency(LinkedList)
HashMap: with v values, k keys
v*sizeOf(valueDataType)
Tree: k-way tree with n nodes
n*(sizeOf(nodeDataType)+(k*sizeOf(pointer)))+sizeOf(pointer[head pointer])
Graph: e edges, v vertices
AdjacencyList:
at most: v*((v*sizeOf(vertexDataType))+(e*sizeOf(pointer))) fully connected graph
at least: v*sizeOf(vertexDataType) disconnected graph
AdjacencyMatrix:
v^2*sizeOf(int)
Upvotes: 0