user47322
user47322

Reputation:

Data structure which maps non-overlapping ranges to values?

I need a data structure which maps non-overlapping ranges (eg. 8..15, 16..19) to a pointer to a struct.

I will need to be able to look up any element in that range and retrieve that pointer. For example:

structure[8..15] = 0x12345678;
structure[16..19] = 0xdeadbeef;

structure[7]; // => NULL
structure[12]; // => 0x12345678
structure[18]; // => 0xdeadbeef

I'm considering using a binary search tree at the moment. Since the ranges will never overlap, I can search for indexes relatively easily in logarithmic time.

However, I'm wondering if there are any data structures more suitable for this case. I need to be able to efficiently insert, delete and lookup. All of these operations are O(log n) in a BST, but I'm wondering if there's anything that's faster for this.

Upvotes: 2

Views: 458

Answers (2)

Evgeny Kluev
Evgeny Kluev

Reputation: 24677

If you want something faster than O(log n), use Van Emde Boas tree.

It should be used the same way you use a binary search tree: use start of each range as a key, end of the range - as part of value (together with the pointer), mapped to this key. Time complexity is O(log log M), where M is range of keys (INT_MAX, if any integer value is possible for start of range).

In some cases Van Emde Boas tree has large memory overhead. If that is not acceptable, use either a simple trie, as explained by Beni, or Y-fast trie.

Upvotes: 3

Beni Cherniavsky-Paskin
Beni Cherniavsky-Paskin

Reputation: 10059

I don't think you can do much better.

Non-overlapping ranges are equivallent to a sequence of alternating start/end points. So lookup is just "find largest element ≤ x" followed by O(1) check if it's a start or an end. I.e. an ordered map.

The usual suspects for that - binary trees, B trees, various tries - are all essentially O(log n). Which is best in practice is a matter of tuning, depends on knowing something about the ranges. Are they sparse or dense? Are they of similar size or vary widely? How large is the data (fits in cache / ram / disk)? Do you insert/delete a lot or are lookup dominant? Is access random or with high locality?

One tradeoff applicable to many schemes is splitting ranges, replicating the same pointer in several places. This may speed up lookups at expense of insert/delete and memory usage. An extreme application is just a flat array indexed by point, where lookup is O(1) but insertion is O(size of range); this begs for a multi-level structure: an array of k uniform subranges, pointing to value if entirely covered by one range or to sub-array if not. Hey, I just described a trie! Lookup is log(maxint)/log(k), very fast if k is power of 2 (e.g. 256); insertion and memory are k*log(n).
But remember that wasting memory hurts cache performance, so any such "optimization" may actually be counter-productive, even for lookups.

Upvotes: 2

Related Questions