Reputation: 2802
Can anyone explain why databases tend to use b-tree indexes rather than a linked list of ordered elements.
My thinking is this: On a B+ Tree (used by most databases), the none-leaf nodes are a collection of pointers to other nodes. Each collection (node) is a ordered list. The leaf nodes, which is where all the data pointers are, is a linked list of clusters of data pointers.
The non-leaf nodes are just used to find the correct leaf node in which your target data pointer lives. So as the leaf nodes are just like a linked list, then why not just do away with the tree elements and just have the linked list. Meta data can be provided which gives the minimum and maximum value of each leaf node cluster, so the application can just read the meta data and find the correct leaf where the data pointer lives.
Just to be clear that the most efficent algorithm for searching an random accessed ordered list is an binary search which has a performance of O(log n) which is the same as a b-tree. The benifit of using a linked list rather than a tree is that they don't need to be ballanced.
Is this structure feasible.
Upvotes: 4
Views: 9265
Reputation: 2802
After some research and paper reading I found the answer.
In order to cope with large amounts of data such a millions of records, indexes have to be organised into clusters. A cluster is a continuous group of sectors on a disk that can be read into memory quickly. These are usually about 4096 bytes long.
Each one of these clusters can contain a bunch of indexes which can point to other clusters or data on a disk. So if we had a linked list index, each element of the index would be made up of the collection of indexes contained in a single cluster (say 100).
So, when we are looking for a specific record, how do we know which cluster it is on. We perform a binary search to find the cluster in question [O(log n)].
However, to do a binary search we need to know where the range of values in each clusters, so we need meta-data that says the min and max value of each cluster and where that cluster is. This is great. Except if each cluster can contain 100 indexes, and our meta data is also held on a single cluster (for speed) , then our meta data can only point to 100 clusters.
What happens if we want more than 100 clusters. We have to have two meta-data indexes, each pointing to 100 clusters (10 000 records). Well that’s not enough. Lets add another meta-data cluster and we can now access 1 000 000 records. So how do we know which one of the three meta-data clusters we need to query in order to find our target data cluster. We could search one then the other, but that doesn’t scale. So I add another meta-meta-data cluster to indicate which one of the three meta-data clusters I should query to find the target data cluster. Now I have a tree!
So that’s why databases use trees. It’s not the speed it’s the size of the indexes and the need to have indexes referencing other indexes. What I have described above is a B+Tree – child nodes contain references to other child nodes or leaf nodes, and leaf nodes contain references to data on disk.
Phew!
Upvotes: 17
Reputation: 8706
I guess I answered that question in Chapter 1 of my SQL Indexing Tutorial: http://use-the-index-luke.com/sql/anatomy
To summarize the most important parts, with respect to your particular question:
-- from "The Leaf Nodes"
The primary purpose of an index is to provide an ordered representation of the indexed data. It is, however, not possible to store the data sequentially because an insert statement would need to move the following entries to make room for the new one. But moving large amounts of data is very time-consuming, so that the insert statement would be very slow. The problem's solution is to establish a logical order that is independent of physical order in memory.
-- from "The B-Tree":
The index leaf nodes are stored in an arbitrary order—the position on the disk does not correspond to the logical position according to the index order. It is like a telephone directory with shuffled pages. If you search for “Smith” in but open it at “Robinson” in the first place, it is by no means granted that Smith comes farther back. Databases need a second structure to quickly find the entry among the shuffled pages: a balanced search tree—in short: B-Tree.
Upvotes: 5
Reputation: 8134
Linked lists are usually not ordered by key value, but by the moment of insertion: insertion is done at the end of list and each new entry contains a pointer to the previous entry of the list.
They are usually implemented as heap structures.
This has 2 main benefits:
they are very easy to manage (you just need a pointer for each element)
if used in combination with an index you can overcome the problem of sequential access.
If instead you use an ordered list, by key value, you will have ease of access (binary search), but encounter problems each time you edit, delete, insert a new element: you must infact keep your list ordered after performing operation, making algorithms more complex and time consuming.
B+ trees are better structures, having all the properties you stated, and other advantages:
you can make group searches (by intervals of key values) with same cost of a single search: since elements in the leafs result automatically ordered thanks to the insertion algorithm, which is not possible in linked lists cause it would require many linear searches over the list.
cost is logarithmic with number of elements contained and especially since these structures are kept balanced cost of access does not depend on the particulare value you are looking for (very usefull).
these structures are very efficient in update, insert or delete operations.
Upvotes: 2