language-agnosticdata-structureslinked-list

Reputation: 490178

Under what circumstances are linked lists useful?

Most times I see people try to use linked lists, it seems to me like a poor (or very poor) choice. Perhaps it would be useful to explore the circumstances under which a linked list is or is not a good choice of data structure.

Ideally, answers would expound on the criteria to use in selecting a data structure, and which data structures are likely to work best under specified circumstances.

Edit: I must say, I'm quite impressed by not only the number, but the quality of answers. I can only accept one, but there are two or three more I'd have to say would have been worth accepting if something a bit better hadn't been there. Only a couple (especially the one I ended up accepting) pointed to situations where a linked list provided a real advantage. I do think Steve Jessop deserves some sort of honorable mention for coming up with not just one, but three different answers, all of which I found quite impressive. Of course, even though it was posted only as a comment, not an answer, I think Neil's blog entry is well worth reading as well -- not only informative, but quite entertaining as well.

Upvotes: 137

Answers (17)

Fifnmar

Reputation: 494

Let me give you some examples. :)

Most answers focus on the traits of a pedagogical linked list, but the typical idea is commonly used in a hybrid fashion, 1) where the list nodes are attached with additional information (sometimes it's the other way around, the nodes are embedded elsewhere), or 2) where the nodes are "chunked", also known as "paged" or "segmented".

Rope Strings are a good example to start with. It's a popular data structure in text editors.

Recall that a text editor allows you to jump around the text buffer, and insert at any position. This is done efficiently with string splitting. Given a compact string:

std::string text = "Hello world!";

When you insert a comma in the sentence, the string is split into two parts, and the comma that you insert sits in the middle of this linked list.

"Hello" -> "," -> " world!"

In time, the rope string may compact some segments to reduce memory fragmentation, but you can think of it as a "lazy linked list" where the characters are not split into a real node until you insert something at the point. You can imagine a naive rope string "abc" without compaction to expand into a pedagogical linked list once you insert an 'x' before every character.

original: "abc"
after-insertion: "x" -> "a" -> "x" -> "b" -> "x" -> "c"

Skip List is another great example.

I won't expand too much on what skip lists are, but it's equivalent to a binary search tree (or treap, specifically), and it's paged version is equivalent to a B-tree, so it's very efficient even with mingled searches and insertions. It's locality and that fact that you don't have to rotate it like you do with a RB-tree makes it superior for concurrency and implementation simplicity. This data structure is often the cornerstone of sorted maps such as the one in Redis.

Intrusive Linked Lists are valuable for inversion of memory control, where you might need to traverse an indefinite number of elements but you don't want to allocate the resources to hold them on your own. for example when the user can register a series of callbacks, you can ask them to inherit your base node class:

// In framework code
class BaseHandler {
    BaseHandler* next;
    friend constexpr void registerHandler(BaseHandler* handler) noexcept;

public:
    constexpr BaseHandler(BaseHandler* next = nullptr) noexcept: next(next) {}
};

constexpr void registerHandler(BaseHandler* handler) noexcept {
    auto& head = getCurrentHeadHandler();
    handler->next = head;
    head = handler;
}

// In user code
class MyJoystickHandler: BaseHandler {
    int other_field;
};

You can see that instead of creating a new type Node<T> that holds user-input data as a field, we're asking the user to mix-in our base fields to their objects, which is why this technique is called "intrusive".

This trick is very useful if you want a flexible memory scheme where some nodes might not only be fresh-allocated on the heap but embedded in another object, on the stack and even in the BSS segment. It's often efficient if the user already has some space to utilize. It's commonly seen in event-driven programming, timer tasks, and OS kernels.

Now you can understand that the general idea behind linked lists are to connect segments with links instead of packing them together, so your local changes don't have to touch other parts. This enables great concurrency and memory locality. To force yourself create a node for each single int is not a practical usage of this amusing and powerful data structure. Once you understand the idea, you will be able to apply its extended forms to different use cases.

Upvotes: 1

Manuel Selva

Reputation: 19050

I know this is not a direct answer to the question, but Python standard implementation of the abstract data type called deque found in collections.deque uses a doubly linked list.

Upvotes: 1

Raildex

Reputation: 4747

Another advantage a linked list has over a contiguous collection is that a linked list can allocate the object wherever it fits.
When you have thousands if not millions of objects in contiguous memory, it will become difficult to reallocate the memory - at some point you need twice and more the memory if the contiguous collection needs to reallocate.

With a linked list you have small allocations that can fit in "gaps" of unused memory.

Upvotes: 1

user4842163

Reputation:

One of the most useful cases I find for linked lists working in performance-critical fields like mesh and image processing, physics engines, and raytracing is when using linked lists actually improves locality of reference and reduces heap allocations and sometimes even reduces memory use compared to the straightforward alternatives.

Now that can seem like a complete oxymoron that linked lists could do all that since they're notorious for often doing the opposite, but they have a unique property in that each list node has a fixed size and alignment requirements which we can exploit to allow them to be stored contiguously and removed in constant-time in ways that variable-sized things cannot.

As a result, let's take a case where we want to do the analogical equivalent of storing a variable-length sequence which contains a million nested variable-length sub-sequences. A concrete example is an indexed mesh storing a million polygons (some triangles, some quads, some pentagons, some hexagons, etc) and sometimes polygons are removed from anywhere in the mesh and sometimes polygons are rebuilt to insert a vertex to an existing polygon or remove one. In that case, if we store a million tiny std::vectors, then we end up facing a heap allocation for every single vector as well as potentially explosive memory use. A million tiny SmallVectors might not suffer this problem as much in common cases, but then their preallocated buffer which isn't separately heap-allocated might still cause explosive memory use.

The problem here is that a million std::vector instances would be trying to store a million variable-length things. Variable-length things tend to want a heap allocation since they cannot very effectively be stored contiguously and removed in constant-time (at least in a straightforward way without a very complex allocator) if they didn't store their contents elsewhere on the heap.

If, instead, we do this:

struct FaceVertex
{
    // Points to next vertex in polygon or -1
    // if we're at the end of the polygon.
    int next;
    ...
};

struct Polygon
{
     // Points to first vertex in polygon.
    int first_vertex;
    ...
};

struct Mesh
{
    // Stores all the face vertices for all polygons.
    std::vector<FaceVertex> fvs;

    // Stores all the polygons.
    std::vector<Polygon> polys;
};

... then we've dramatically reduced the number of heap allocations and cache misses. Instead of requiring a heap allocation and potentially compulsory cache misses for every single polygon we access, we now only require that heap allocation when one of the two vectors stored in the entire mesh exceeds their capacity (an amortized cost). And while the stride to get from one vertex to the next might still cause its share of cache misses, it's still often less than if every single polygon stored a separate dynamic array since the nodes are stored contiguously and there's a probability that a neighboring vertex might be accessed prior to eviction (especially considering that many polygons will add their vertices all at once which makes the lion's share of polygon vertices perfectly contiguous).

Here is another example:

... where the grid cells are used to accelerate particle-particle collision for, say, 16 million particles moving every single frame. In that particle grid example, using linked lists we can move a particle from one grid cell to another by just changing 3 indices. Erasing from a vector and pushing back to another can be considerably more expensive and introduce more heap allocations. The linked lists also reduce the memory of a cell down to 32-bits. A vector, depending on implementation, can preallocate its dynamic array to the point where it can take 32 bytes for an empty vector. If we have around a million grid cells, that's quite a difference.

... and this is where I find linked lists most useful these days, and I specifically find the "indexed linked list" variety useful since 32-bit indices halve the memory requirements of the links on 64-bit machines and they imply that the nodes are stored contiguously in an array.

Often I also combine them with indexed free lists to allow constant-time removals and insertions anywhere:

In that case, the next index either points to the next free index if the node has been removed or the next used index if the node has not been removed.

And this is the number one use case I find for linked lists these days. When we want to store, say, a million variable-length sub-sequences averaging, say, 4 elements each (but sometimes with elements being removed and added to one of these sub-sequences), the linked list allows us to store 4 million linked list nodes contiguously instead of 1 million containers which are each individually heap-allocated: one giant vector, i.e., not a million small ones.

Upvotes: 5

metamorphosis

Reputation: 2105

One example of good usage for a linked list is where the list elements are very large ie. large enough that only one or two can fit in CPU cache at the same time. At this point the advantage that contiguous block containers like vectors or arrays for iteration have is more or less nullified, and a performance advantage may be possible if many insertions and removals are occurring in realtime.

Upvotes: 3

780Farva

Reputation: 186

Consider that a linked list might be very useful in a Domain Driven Design style implementation of a system that includes parts that interlock with repetition.

An example that comes to mind might be if you were to be modelling a hanging chain. If you wanted to know what the tension on any particular link was, your interface could include a getter for "apparent" weight. The implementation of which would include a link asking its next link for its apparent weight, then adding its own weight to the result. This way, the whole length down to the bottom would be evaluated with a single call from the chain's client.

Being a proponent of code that reads like natural language, I like how this would let the programmer ask a chain link how much weight it's carrying. It also keeps the concern of calculating these kids of properties within the boundary of the link implementation, removing the need for a chain weight calculating service".

Upvotes: 1

Rafał Dowgird

Reputation: 45101

There are two complementary operations which are trivially O(1) on lists and very hard to implement in O(1) in other data structures - removing and inserting an element from arbitrary position, assuming you need to maintain the order of elements.

Hash maps can obviously do insertion and deletion in O(1) but then you cannot iterate over the elements in order.

Given the fact above, hash map can be combined with a linked list to create a nifty LRU cache: A map that stores a fixed number of key-value pairs and drops the least recently accessed key to make room for new ones.

The entries in the hash map need to have pointers to the linked list nodes. When accessing the hash map, the linked list node is unlinked from its current position and moved to the head of the list (O(1), yay for linked lists!). When there's need to remove the least recently used element, the one from the tail of the list needs to be dropped (again O(1) assuming you keep the pointer to the tail node) together with the associated hash map entry (so backlinks from the list to the hash map are necessary.)

Upvotes: 1

LiKao

Reputation: 10658

Linked lists are one of the natural choices when you cannot control where your data is stored, but you still need to somehow get from one object to the next.

For example when implementing memory tracking in C++ (new/delete replacement) you either need some control data structure that keeps track of which pointers have been freed, which you fully need to implement yourself. The alternative is to overallocate and add a linked list to the beginning of each data chunk.

Because you always imediately know, where you are in the list when delete is called, you can easily give up memory in O(1). Also adding a new chunk that has just been malloced is in O(1). Walking the list is very rarely needed in this case, so the O(n) cost is not an issue here (walking a structure is O(n) anyway).

Upvotes: 4

dbgrman

Reputation: 5681

From my experience, implementing sparse-matrices and fibonacci heaps. Linked lists give you more control over the overall structure for such data structures. Though I'm not sure if sparse-matrices are best implemented using linked lists - probably there is a better way, but it really helped learning the ins-and-outs of sparse matrices using linked lists in undergrad CS :)

Upvotes: 2

Andras Vass

Reputation: 11638

They can be useful for concurrent data structures. (There is now a non-concurrent real-world usage sample below - that would not be there if @Neil hadn't mentioned FORTRAN. ;-)

For example, ConcurrentDictionary<TKey, TValue> in .NET 4.0 RC use linked lists to chain items that hash to the same bucket.

The underlying data structure for ConcurrentStack<T> is also a linked list.

ConcurrentStack<T> is one of the data structures that serve as the foundation for the new Thread Pool, (with the local "queues" implemented as stacks, essentially). (The other main supporting structure being ConcurrentQueue<T>.)

The new Thread Pool in turn provides the basis for the work scheduling of the new Task Parallel Library.

So they can certainly be useful - a linked list is currently serving as one of the main supporting structures of at least one great new technology.

(A singly-linked list makes a compelling lock-free - but not wait-free - choice in these cases, because main operations can be carried out with a single CAS (+retries). In a modern GC-d environment - such as Java and .NET - the ABA problem can easily be avoided. Just wrap items you add in freshly created nodes and do not reuse those nodes - let the GC do its work. The page on the ABA problem also provides the implementation of a lock-free stack - that actually works in .Net (&Java) with a (GC-ed) Node holding the items.)

Edit: @Neil: actually, what you mentioned about FORTRAN reminded me that the same kind of linked lists can be found in probably the most used and abused data structure in .NET: the plain .NET generic Dictionary<TKey, TValue>.

Not one, but many linked lists are stored in an array.

It avoids doing many small (de)allocations on inserts/deletes.
Initial loading of the hash table is pretty fast, because the array is filled sequentially (plays very nice with CPU cache).
Not to mention that a chaining hash table is expensive in terms of memory - and this "trick" cuts "pointer sizes" in half on x64.

Essentially, many linked lists are stored in an array. (one for each bucket used.) A free list of reusable nodes is "interwoven" between them (if there were deletes). An array is allocated at the start/on rehash and nodes of chains are kept in it. There is also a free pointer - an index into the array - that follows deletes. ;-) So - believe it or not - the FORTRAN technique still lives on. (...and nowhere else, than in one of the most commonly used .NET data structures ;-).

Upvotes: 45

Andrea Zilio

Reputation: 4534

Arrays are the data structures to which Linked Lists are usually compared.

Normally linked lists are useful when you have to make a lot of modification to the list itself while arrays performs better than lists on direct element access.

Here's a list of operations that can be performed on lists and arrays, compared with the relative operation cost (n = list/array length):

Adding an element:
- on lists you just need to allocate memory for the new element and redirecting pointers. O(1)
- on arrays you have to relocate the array. O(n)
Removing an element
- on lists you just redirect pointers. O(1).
- on arrays you spend O(n) time to relocate the array if the element to remove is not the first or the last element of the array; otherwise you can simply relocate the pointer to the start of the array or decrease the array length
Getting an element in a known position:
- on lists you have to walk the list from the first element to the element in the specific position. Worst case: O(n)
- on arrays you can access the element immediately. O(1)

This is a very low-level comparison of these two popular and basic data structures and you can see that lists performs better in situations where you have to make a lot of modifications to the list it self (removing or adding elements). On the other hand arrays performs better than lists when you have to access directly the elements of the array.

From the point of view of the memory allocation, lists are better because there's no need to have all the elements next to each others. On the other hand there's the (little) overhead of storing the pointers to the next (or even to the previous) element.

Knowing these differences is important to developers to choose between lists and arrays in their implementations.

Note that this is a comparison of lists and arrays. There are good solutions to the problems here reported (eg: SkipLists, Dynamic Arrays, etc...). In this answer I've taken into account the basic data structure every programmer should know about.

Upvotes: 16

Steve Jessop

Reputation: 279285

Singly-linked lists are the obvious implementation of the common "list" data type in functional programming languages:

Adding to the head is fast, and (append (list x) (L)) and (append (list y) (L)) can share almost all of their data. No need for copy-on-write in a language with no writes. Functional programmers know how to take advantage of this.
Adding to the tail is unfortunately slow, but so would be any other implementation.

By comparison, a vector or deque would typically be slow to add at either end, requiring (at least in my example of two distinct appends) that a copy be taken of the entire list (vector), or the index block and the data block being appended to (deque). Actually, there may be something to be said there for deque on large lists which do need to add at the tail for some reason, I'm not sufficiently informed about functional programming to judge.

Upvotes: 3

Steve Jessop

Reputation: 279285

Doubly-linked list is a good choice to define the ordering of a hashmap which also defines an order on the elements (LinkedHashMap in Java), especially when ordered by last access:

More memory overhead than an associated vector or deque (2 pointers instead of 1), but better insert/remove performance.
No allocation overhead, since you need a node for a hash entry anyway.
Locality of reference is no additional problem compared with a vector or deque of pointers, since you'd have to pull each object into memory either way.

Sure, you can argue about whether an LRU cache is a good idea in the first place, compared with something more sophisticated and tuneable, but if you're going to have one, this is a fairly decent implementation. You do not want to perform a delete-from-middle-and-add-to-the-end on a vector or deque at every read access, but moving a node to the tail is typically fine.

Upvotes: 4

Steve Jessop

Reputation: 279285

Singly-linked list is a good choice for the free list in a cell allocator or object pool:

You only need a stack, so a singly-linked list is sufficient.
Everything is divided into nodes already. There is no allocation overhead for an intrusive list node, provided the cells are large enough to contain a pointer.
A vector or deque would impose an overhead of one pointer per block. This is significant given that when you first create the heap, all cells are free, so it's an up-front cost. In the worst case it doubles the memory requirement per cell.

Upvotes: 4

Jason Williams

Reputation: 57902

Linked lists are very useful when you need to do a lot of insertions and removals, but not too much searching, on a list of arbitrary (unknown at compile-time) length.

Splitting and joining (bidirectionally-linked) lists is very efficient.

You can also combine linked lists - e.g. tree structures can be implemented as "vertical" linked lists (parent/child relationships) connecting together horizontal linked lists (siblings).

Using an array based list for these purposes has severe limitations:

Adding a new item means the array must be reallocated (or you must allocate more space than you need to allow for future growth and reduce the number of reallocations)
Removing items leaves wasted space or requires a reallocation
inserting items anywhere except the end involves (possibly reallocating and) copying lots of the data up one position

Upvotes: 60

Chris Lercher

Reputation: 37778

Linked lists are very flexible: With the modification of one pointer, you can make a massive change, where the same operation would be very inefficient in an array list.

Upvotes: 22

Ignacio Vazquez-Abrams

Reputation: 798744

They're useful when you need high-speed push, pop and rotate, and don't mind O(n) indexing.

Upvotes: 4

Under what circumstances are linked lists useful?

Answers (17)

Related Questions