Is a Python list the same list as defined by my data structures and algorithms textbook?

Question

I am reading a book about data structures and basic algorithms to get a better grasp on programming. The book I am reading defines some data structures, e.g. list, array, etc, and specifies the interfaces to these, and also how they can be implemented (or constructed) using existing data structures in different languages. For example, it discusses how a list can be constructed as an array or as linked cells.

However, when I start looking at these data types in different languages I find that they do not always follow the interfaces that are defined in the book.

For example, the book says that in a list you should be able to step through the different elements one at the time, starting only at one of the ends. For example, if I want the third element, I would have to start with the first element and use a "next" operation until I get to the element I want. But lists in Python have integer indices and I can directly access any element directly by using the corresponding index. In the book, the index type only has to be an ordered type with a comparison operation for equality. (I think the book tries to stay general by abstracting things).

Does this mean that Python's lists are not lists in the "true sense"? Or is it that the data structures defined in the book are only suggestions on how the interfaces for these can look, and in practice there can be a lot of variation?

I understand that the idea about the book is to get a better understanding for how to think about data structures, and to introduce some typical ones, so I am inclined to think that the latter answer is the correct one.

ggorlen · Accepted Answer

There are multiple names for many data structures (hash[map/table]/dictionary/associative array comes to mind). You're referring to two data structures which are unrelated beyond the fact that they both store a linear sequence of items. The name similarity is an unfortunate source of confusion, but the TL;DR is that lists in Python's terminology are very similar to, and are built upon, arrays while linked lists, which your text refers to as "lists", aren't.

It's important to separate implementation from interface: the underlying data structure behind a queue, for example, could be an array or a linked list, but this should be an implementation detail that the client of the interface doesn't know or care about. The interface should enforce guarantees for the time complexity of various operations. It's common for data structure texts to implement abstract data types using various underlying "primitive" data structures like arrays or linked lists, many of which may impact the resulting complexity of various operations (prompting student analysis).

Linked lists are simple, dynamic (expandible) data structures that offer fast O(1) add and removal operations on both ends. Their nodes may be doubly-linked and are usually structs or objects in memory with pointers or references to next and (if doubly-linked) previous elements as well as a data property per node. There is no random access for linked lists--you have to walk through the list from an endpoint to locate a single element by following the pointers.

Linked lists are typically used to implement queues, stacks and more complex data structures like Java's LinkedHashMap, which combines a doubly linked list and a hash map. Python offers collections.deque and Queue which are linked list-based data structures offering fast access to front and back elements. Other libraries, like Java's ArrayDeque, use circular arrays to implement the same deque interface.

Tree nodes are a variant of singly linked list nodes with two child pointers rather than a single next pointer. As with linked lists (and their nodes), typically tree nodes and trees will be an implementation detail behind an interface such as the C++ map. You can implement trees with arrays, for example, as is typically the case with heaps.

Python's lists (also known as ArrayLists (Java), arrays (JS/Ruby), vectors (C++/Haskell) and List (C#)) are a dynamic abstraction on primitive arrays, which are fixed in size. The list abstraction adds a length property and many functions for manipulating the underlying array like append/push/push_back, pop, shift, splice, etc (names are dependent on language). The underlying array will be automatically resized to fit the number of elements it contains. Inserting or removing elements at the front or middle of the list is an O(n) operation since as many as all elements need to be shifted to accommodate the adjustment to the array. This shifting is part of the abstraction and is hidden to the client.

The advantages of lists are the same as those of arrays: fast random access to any element. Additions to the end of a list are also fast and constant-time because the occasional reallocations of the underlying array are amortized across multiple append operations.

An important consideration for lists versus primitive arrays is the memory scheme. Python lists are comprised of objects and suffer from many of the problems of linked lists in that pointers to heap-allocated data need to be referenced and may have poor locality. Using a numpy array provides the advantages of a C array in that you get chunks of sequential memory which benefit from fast access (sweeping a memory-aligned offset over the data) and locality. Increased overhead is the typical cost of abstractions like lists and it can have a major performance impact on certain applications.

To add to the confusion, at least two high level languages, C++ and Haskell, have a "list" data structure, but these are actually linked lists rather than dynamic array(list)s.

Regardless of the language you're using, it's important to distinguish what the data structures you're using really are (in terms of time complexity for typical operations, primarily) to avoid inadvertent misuse and select the correct tool for the job.

Is a Python list the same list as defined by my data structures and algorithms textbook?

Answers (2)

Related Questions