Stack ADT (Abstract Data Type) implementation- Array vs linked

Question

What are the pros and cons of implementing Stack based on array vs linked. From my limited knowledge i feel that linked will always be a better way to implement Stack because:

1) no random acess is needed.

2)array are inefficient because they have to be resized(waste of time) and also they use storage inefficiently (some space is always wasted)

Im sure im missing something here because:

1) java.util.Stack is implemented based on array (its a subclass of java.util.Vector which is a legacy class from before the creation of java collections interface and is virtually similar to ArrayList). So the creators of java chose to do an array based implementation.

2)Ive read an answer here on stackoverflow that "An array based implementation on the other hand may have better runtime behavior". what is meant by this though i have no clue.

The comparison im looking for should include the following parameters:

1)Theoretical time and storage requirement.

2)Runtime performance (if different from theoretical comparison).

Please include any other important parameter which i have failed to mention due to my lack of knowledge. Im using java if that makes any difference at all on the conclusion.

P.S-I couldnt find all the points asked in this question in any other answer on this website so please only mark this question as duplicate only in case all my questions have been answered correctly and in enough detail in the other question.

P.P.S- I know this is a very long question so TIA for your effort :) Also if you feel it is too broad then kindly comment on how to break it up before you tag it as "too broad" so i may edit it as required.

Stuart Marks · Accepted Answer

First, you should be aware that java.util.Stack is a "legacy collection" that dates all the way back to Java 1.0. It extends java.util.Vector which is indeed array-based. However, this is generally regarded as bad object design. That doesn't mean that an array-based stack is a bad thing, but you should be aware that just because something is in the JDK doesn't mean that it's a good idea. This is particularly true of the older APIs.

A more modern stack-like data structure is java.util.ArrayDeque. It's also array-based. It has a bunch of other features, but if you stick to its push and pop methods (equivalent to addFirst and removeFirst), it's basically a stack. Note that in its documentation it says,

This class is likely to be faster than Stack when used as a stack.

If you look at the implementations, Stack, like Vector, is synchronized, so that can slow it down a bit. Stack's push and pop methods are implemented in terms of Vector methods, which are also synchronized. This means additional method calls plus nested synchronization. (The JIT can probably optimize away most of this, though.) By contrast, ArrayDeque is not synchronized, and its stack-like methods use simple operations that operate directly on its internal array. Note, I haven't done any benchmarking here to validate the documentation's claims.

In any case, ArrayDeque is the preferred Java collection to use for problems that require stack-like behavior.

But you were asking about linked data structures as opposed to array-based ones. Let's compare ArrayDeque to another Java linked data structure, LinkedList. This implements Deque so it can also be used as a stack. You said,

1) no random access is needed.

True. Note that ArrayDeque doesn't offer random access, even though it's array-based. No advantage to either.

2) arrays are inefficient because they have to be resized (waste of time) and also they use storage inefficiently (some space is always wasted)

Any data structure will have some inefficiencies. Different data structures will have different tradeoffs, though. If ArrayDeque's array isn't sized for the typical capacity of the stack, it'll have to be resized. But once the array is big enough, it doesn't need to be resized continually. If the stack shrinks, the array will still consume space that's empty. This could be viewed as a waste, or it could be viewed as holding memory in reserve so that it doesn't have to be resized and copied if the stack grows again.

Compare the situation to LinkedList. Internally, each list element requires a Node instance. (See source here.) Each instance contains three references: one to the data element, one to the next Node, and one to the previous Node. Assuming a 64-bit JVM with compressed pointers, that's 32 bits per reference. Each object has a header containing a 64-bit mark word and a 32-bit class pointer. That gives a total of six 32-bit words, or 24 bytes per list element. Only one of the six words is the "payload" -- the element itself -- so that's 20 bytes or 83% overhead per element!

By contrast, each additional element in an array consumes only the space for the reference to that element, or 32 bits.

For example, storing 1,000 elements in a LinkedList consumes about 24K of memory, but storing them in an ArrayDeque consumes only about 4K of memory. Even if the array is twice as big as it needs to be to hold the 1,000 elements, that's only 8K -- still only a fraction of the size of the LinkedList.

"An array based implementation on the other hand may have better runtime behavior"

This is probably referring to the traversing the linked list being slower than traversing the array. There are two reasons for this. First, the link nodes occupy more memory, so more memory must be read in order to get the same information. At 24 bytes per node, 2.67 nodes can fit in a typical 64-byte cache line. Second, the link nodes are likely to be spread around memory somewhat randomly, so on average there may be fewer nodes than this per cache line. The result is that traversing a linked list will cause a lot of cache misses because of this poor locality of reference.

On the other hand, since references in an array are densely packed with no overhead, 16 array elements can fit within a single 64-byte cache line. Traversing an array will cause many fewer cache misses. Furthermore, memory subsystems optimize for sequential access, so they might be able to prefetch the next cache line, reducing memory overhead even further.

Considering memory consumption and the performance cost of memory access, array-based structures are usually preferable. There may be some cases where linked structures perform better, but there seem to be fewer than most people think.

Setting performance aside, there is one clear advantage to a linked structure over an array structure for a stack: immutable persistence. Pushing and popping elements on an array-based stack inherently mutates the array, so previous versions no longer exist. Pushing and popping nodes on a linked structure doesn't need to mutate any of the linked nodes, so references to past states of the stack can be held persistently and will remain unchanged, even if somebody else pushes and pops elements on this stack. Effectively they are different stacks, with the common portions shared. This is extremely useful in functional programming contexts.

Stack ADT (Abstract Data Type) implementation- Array vs linked

Answers (1)

Related Questions