amirouche
amirouche

Reputation: 7883

What is the most efficient datastructure for an ordered sequence with gaps search and append only?

I need to store a set of ordered integers which can form sequences like the following

1,2,3,15,16,20,21,25,26,27,28

It can also be represented as

1-3,15-16,20-21,25-28

I don't need the sequences to be ordered I just need to be able to add integers and know if some integer is in the set.

I'm looking for datastructure that is fast O(lg(n)) or O(n*lg(n)) in terms of insertion and search ie. is X in the set of integers, that can handle concurrent read-write and if possible write-write without locks and without persistence.

For same insertion and search time, the more space efficient solution will be choosen.

A binary search tree but it is not good enough because since integers are inserted in ever growing ascending orderd the generated tree doesn't look good, so I think I need a multiversion self balancing tree.

There is no duplicates.

No code is needed just an explanation with references will do the job.

Background: This is for a mvcc database, each transaction has a transaction id which should be unique while being ordered ie. for two T1(t1), T2(t2), id(T1) < id(T2). The gaps comes from the fact a transaction doesn't commit its transaction id is lost. Transaction ids are used to annotate data versions, to know if a version of a data should be considered and how, you must know at least if it's commited for that I must maintain a list of commited transaction, a hash map of intergers can do the job perfectly for a POC but not in the long run. I don't know how professionnal dbs do that...

Similar question which can be a bit misleading: Finding a gap in an ordered range of adjacent numbers

Upvotes: 1

Views: 1140

Answers (1)

Fred Foo
Fred Foo

Reputation: 363757

I suggest an interval tree -- that's an amended binary search tree that compresses intervals. The problem of ordered insertion can be handled by using a self-balancing variant. Concurrency support can be achieved either with locks or by implementing a persistent version.

Upvotes: 2

Related Questions