Rahul Patel
Rahul Patel

Reputation: 65

Lucene index modeling - Why are skiplists used instead of btree?

I have recently started learning lucene and came to know about how lucene stores and queries indices. Lucene seems to be using skip list as an underlying data structure. However, I did not find any reason to use skip list over a binary tree.

The advantage with skip lists is that it provides good performance when being used concurrently. And lucene allows single writer thread per index and readers read from immutable segments, so skip list is not helping here either. Other than that binary tree (self balancing) trumps skip list - since it provides worst case complexity of O(logn) for reading and writing whereas skip list provides same time complexity in average case. Also, binary tree would serve range queries in better time compared to skip list. For serving a conjunction query as well, lucene uses skip lists of multiple postings list to find their intersection - for this case too binary tree would have been enough.

Is there any specific reason skip list is used in lucene for indexing purposes that I have missed?

Upvotes: 2

Views: 1654

Answers (1)

RonC
RonC

Reputation: 33791

Lucene builds an inverted index using Skip-Lists on disk, and then loads a mapping for the indexed terms into memory using a Finite State Transducer (FST). See this SO answer for How does lucene index documents?

In that answer, it also indicates that the primary benefit of using Skip-Lists is that it avoids ever having to rebalance a B-Tree. If you'd like to dig deeper that answer cite another one that provides a lot more detail: Skip List vs. Binary Search Tree Which intern references additional whitepapers.

Researching this a bit more, there is one other advantage to using Skip-Lists rather then a BTree. It's not just the rebalancing that is avoided, but also avoided is the locking of a portion of the tree while the rebalancing is taking place. This aspect is discussed further here. This latter advantage improves concurrency.

Upvotes: 6

Related Questions