Reputation: 1239
I have to implement a data structure which supports the following three functions. The data is a pair(a,b) of two double values and the data is concentrated in a particular region. Let's say with values of 'a' in the range 500-600.
Insert(double a, double b) - Insert the data, a pair(double,double) in the data structure. If the first element of the pair already exists, update its second element to the new value.
Delete(double a) - Delete the data containing the first element = a.
PrintData(int count) - Print the value of the data which has the count-th largest value. Value is compared according to data.first.
The input file contains a series of Insert, Delete and PrintData operations. Currently, I have implemented the data structure as a height balanced binary search tree using STL-Map but it is not fast enough.
Is there any other implementation which is faster than a Map. We can use caching to store the most common PrintData queries.
Upvotes: 2
Views: 280
Reputation: 134125
I would recommend an indexed skip list. That will give you O(log n) insert and delete, and O(log n) access to the nth largest value (assuming that you maintain the list in descending order).
Skip list isn't any more difficult to implement than a self-balancing binary tree, and gives much better performance in some situations. Well worth considering.
Upvotes: 2
Reputation: 55649
I'd recommend 2 binary search trees (BSTs) - one being the map from a
to b
(sorted by a
), the other should be sorted by b
.
The second will need to be a custom BST - you'll need to let each node store a count of the number of nodes in the subtree with it as root - these counts can be updated in O(log n), and will allow for O(log n) queries to get the k-th largest element.
When doing an insert, you'll simply look up b's value in the first BST first, then remove that value from the second, then update the first and insert the new value into the second.
For a delete, you'll simply look up b's value in the first BST (and remove that pair), then remove that value from the second.
All mentioned operations should take O(log n).
Caching
If you are, for instance, only going to query the top 10 elements, you could maintain another BST containing only those 10 elements (or even just an optionally-sorted array, since there's only 10 elements), which we'll then query instead of the second BST above.
When inserting, also insert into this structure if the value is greater than the smallest one, and remove the smallest.
When removing, we need to look up the next largest value and insert it into the small BST. Although this could also be done lazily - when removing, just remove it from this BST - don't fill it up to 10 again. When querying, if there are enough elements in this BST, we can just look up using this one, otherwise we find all the values in the big BST required to fill this BST up, then we query.
This would result in best-case O(1) query (worst-case O(log n)), while the other operations will still be O(log n).
Although the added complexity is probably not worth it - O(log n) is pretty fast, even for a large n.
Building on this idea, we could only have this small BST along with the BST mapping a
to b
- this would require that we check all values to find the required ones during a query after a removal, so it would only really be beneficial if there aren't a whole lot of removals.
Upvotes: 2