Pheonix
Pheonix

Reputation: 6052

A fast data structure for lower bound and upper bound queries on integers?

i have to maintain a list of numbers, count up to 100,000...

if the data is (for example)

1, 4, 9, 12, 20, 35, 52, 77, 91

and i query for a number, say 27, i want the number just immediately previous to 27, available in the list, so that will be: 20

the data is also going to be modified frequently, like lots of Inserts and erases.

currently i am using stl::set coupled with

set<int>iterator it = lower_bound(values.begin(), values.end(), n);

so

*it = 35 and with it--, i get 20... but this is not fast enough, the number to queries are large, up to 500,000.. which include changing of my values or look up value.

please give me some pointers.

Upvotes: 3

Views: 2801

Answers (3)

Nemo
Nemo

Reputation: 71525

100,000 is small enough that I would consider just using a bit vector... 100,000 bits is only 12.5K, which should be pretty quick to search and will even fit in L1 cache.

Store the bits backward (i.e. 100000 near the end; 0 near the beginning) so that your scan through memory is linear and you can use ffs (if your platform has it).

Upvotes: 0

Andrew
Andrew

Reputation: 551

You can divide all number to (for example) 100 vectors

1-> [0..99]
2-> [100..199]
.....

this vectors should be sorted.Search on vector with lower_bound/upper_bound function usually faster than in associative containers. But for insertin or deleting you will need to resort one small vector.

UPD I agree with templatetypedf: van Emde Boas tree could be beter solution.

Upvotes: 0

templatetypedef
templatetypedef

Reputation: 372714

A few different ideas come to mind.

For starters, there is a specialized data structure for precisely this problem called an van Emde Boas tree that stores integers in some fixed range [0, U) and supports successor and predecessor searches in O(log log U) time. This is exponentially faster than using a standard binary search tree to do the comparisons. If you know an upper bound on the integers that you're storing, this structure might be able to get you a bit more performance. There are other related structures like the y-fast trie that can also be used here as well.

Second, if the queries you have are uniformly distributed, you may want to build your own binary search tree that is optimized to minimize the number of nodes looked up overall. Such a search tree is called an optimal binary search tree, and there are fast algorithms for approximating them in O(n log n) time. In this earlier question, I detailed one approach for doing this. This preprocessing might give you much faster lookups, since the tree is specifically built to optimize lookup times. Alternatively, you could look into splay trees, which give comparable performance.

Hope this helps!

Upvotes: 6

Related Questions