Reputation: 87
Hoping to see if anyone is familiar with a known data structure that would perform as follows.
I want to store a set of ranges (ex [1,3], [7,9]) but on insertion the data store would result in the "compressed" version, ie: given the data structure has nodes [2,4] and [5,6] and insertion of [4, 6] would result in a data store that results in one node of [1, 9].
I'm in the process of building a tree to do this, hoping someone may be able to point me to the right "googly" words.
Motivation: Let's say I have a sequence of IDs I expect in a message protocol. I want to be able to store which ones have been received. Let's say the IDs are 64bits, and we can have 64bits worth of IDs. 2^128 seems like a lot of storage to store this information.
So if instead I can store ranges, gaps in sequence IDs should be sparse therefore my idea is to have [n, m] describe that the sequence ids from n to m have been received. I'm open to ideas.
This is where I'm at right now, far from being complete, it is still a working concept. I would ignore this for the most part.
struct Range
{
explicit Range(std::int64_t range1)
: lower(range1), upper(range1)
{
}
void merge(Range& rhs)
{
lower = std::min(rhs.lower, lower);
upper = std::max(rhs.upper, upper);
}
bool operator<(Range& rhs)
{
return upper < rhs.lower;
}
bool operator=(Range& rhs)
{
return lower == rhs.lower && upper == rhs.upper;
}
bool intersect(Range& rhs)
{
return (lower >= rhs.lower && lower <= rhs.upper) || (upper <= rhs.upper && upper >= rhs.lower);
}
std::int64_t lower, upper;
};
bool rangeWithinOne(const Range& lhs, const Range& rhs)
{
if (abs(rhs.lower - lhs.lower) <= 1)
{
return true;
}
if (abs(rhs.upper - lhs.upper) <= 1)
{
return true;
}
return false;
}
struct RangeTreeNode
{
RangeTreeNode(const Range& r)
: right(NULL), left(NULL), range(r)
{
}
const Range& getRange()
{
return range;
}
private:
friend class RangeTree;
RangeTreeNode* right;
RangeTreeNode* left;
Range range;
};
class RangeTree
{
public:
void insert(const Range& r)
{
Range tmp = r;
tmp.lower -= 1;
tmp.upper += 1;
std::pair<RangeTreeNode*, bool> insertion = findIntersection(tmp);
if (insertion.second)
{
insertion.first->merge(r);
// need to merge the sub trees though!!!!
}
}
template<typename Visitor>
void visitAll(const Visitor& v)
{
visitAllHelper<Visitor>(v, mHead)
}
private:
template<typename Visitor>
void visitAllHelper(const Visitor& v, RangeTreeNode * const node)
{
if (node == NULL)
return;
v.visit(node.range);
visitAllHelper(v, node->right);
visitAllHelper(v, node->left);
}
std::pair<RangeTreeNode*, bool> findIntersection(const Range& r) const
{
RangeTreeNode* iter = mHead;
RangeTreeNode* parent;
while (iter != NULL)
{
if (iter->range.intersect(r))
{
return std::make_pair(iter, true);
}
parent = iter;
if (r < *iter)
{
iter = iter->left;
}
else
{
iter = iter->right;
}
}
return std::make_pair(parent, false);
}
RangeTreeNode* mHead;
};
Upvotes: 0
Views: 107
Reputation: 59194
Since you seem to be using C++, the easiest way to do this is to put the ranges into an std::set
or std::map
(if they have associated values) sorted by their end positions.
Then, to insert [s,e]
, you can use lower_bound([s-1,s-1])
to get the first existing range that might overlap or abut the new one, walk forward to find all mergeable ranges, and replace them with a single range.
Something like this:
void insert(int newStart, int newEnd) {
set<Range>::iterator it;
if (newStart > INT_MIN) {
it = theset.lower_bound(Range(newStart-1, newStart-1));
} else {
it = theset.begin();
}
while( it != theset.end() && (newEnd == INT_MAX || it->start <= newEnd+1)) {
newStart = min(it->start, newStart);
newEnd = max(int->end, newEnd);
theset.erase(it++);
}
theset.insert(Range(newStart,newEnd));
}
That way you will have only disjoint ranges in the set/map, which I believe is what you want, and insert takes amortized O(log N) time.
Upvotes: 1