Thomas Watters
Thomas Watters

Reputation: 87

Structure to hold a set of ranges merging intersecting ranges

Hoping to see if anyone is familiar with a known data structure that would perform as follows.

I want to store a set of ranges (ex [1,3], [7,9]) but on insertion the data store would result in the "compressed" version, ie: given the data structure has nodes [2,4] and [5,6] and insertion of [4, 6] would result in a data store that results in one node of [1, 9].

I'm in the process of building a tree to do this, hoping someone may be able to point me to the right "googly" words.

Motivation: Let's say I have a sequence of IDs I expect in a message protocol. I want to be able to store which ones have been received. Let's say the IDs are 64bits, and we can have 64bits worth of IDs. 2^128 seems like a lot of storage to store this information.

So if instead I can store ranges, gaps in sequence IDs should be sparse therefore my idea is to have [n, m] describe that the sequence ids from n to m have been received. I'm open to ideas.

This is where I'm at right now, far from being complete, it is still a working concept. I would ignore this for the most part.

struct Range
{
    explicit Range(std::int64_t range1)
        : lower(range1), upper(range1)
    {
    }

    void merge(Range& rhs)
    {
        lower = std::min(rhs.lower, lower);
        upper = std::max(rhs.upper, upper);
    }

    bool operator<(Range& rhs)
    {
        return upper < rhs.lower;
    }

    bool operator=(Range& rhs)
    {
        return lower == rhs.lower && upper == rhs.upper;
    }

    bool intersect(Range& rhs)
    {
        return (lower >= rhs.lower && lower <= rhs.upper) || (upper <= rhs.upper && upper >= rhs.lower);
    }

    std::int64_t lower, upper;
};

bool rangeWithinOne(const Range& lhs, const Range& rhs) 
{
    if (abs(rhs.lower - lhs.lower) <= 1)
    {
        return true;
    }

    if (abs(rhs.upper - lhs.upper) <= 1)
    {
        return true;
    }

    return false;
}

struct RangeTreeNode
{
        RangeTreeNode(const Range& r)
            : right(NULL), left(NULL), range(r)
        {
        }

        const Range& getRange()
        {
            return range;
        }
    private:
        friend class RangeTree;
        RangeTreeNode* right;
        RangeTreeNode* left;
        Range range;
};

class RangeTree
{
public:
    void insert(const Range& r)
    {
        Range tmp = r;
        tmp.lower -= 1;
        tmp.upper += 1;

        std::pair<RangeTreeNode*, bool> insertion = findIntersection(tmp);
        if (insertion.second)
        {
            insertion.first->merge(r);
            // need to merge the sub trees though!!!!
        }
    }

    template<typename Visitor>
    void visitAll(const Visitor& v) 
    {
        visitAllHelper<Visitor>(v, mHead)    
    }


private:
    template<typename Visitor>
    void visitAllHelper(const Visitor& v, RangeTreeNode * const node)
    {
        if (node == NULL)
            return;

        v.visit(node.range);
        visitAllHelper(v, node->right);
        visitAllHelper(v, node->left);
    }

    std::pair<RangeTreeNode*, bool> findIntersection(const Range& r) const
    {
        RangeTreeNode* iter = mHead;
        RangeTreeNode* parent;
        while (iter != NULL)
        {
            if (iter->range.intersect(r))
            {
                return std::make_pair(iter, true);
            }

            parent = iter;
            if (r < *iter)
            {
                iter = iter->left;
            }
            else
            {
                iter = iter->right;
            }
        }

        return std::make_pair(parent, false);
    }

    RangeTreeNode* mHead;
};

Upvotes: 0

Views: 107

Answers (1)

Matt Timmermans
Matt Timmermans

Reputation: 59194

Since you seem to be using C++, the easiest way to do this is to put the ranges into an std::set or std::map (if they have associated values) sorted by their end positions.

Then, to insert [s,e], you can use lower_bound([s-1,s-1]) to get the first existing range that might overlap or abut the new one, walk forward to find all mergeable ranges, and replace them with a single range.

Something like this:

void insert(int newStart, int newEnd) {
    set<Range>::iterator it;
    if (newStart > INT_MIN) {
        it = theset.lower_bound(Range(newStart-1, newStart-1));
    } else {
        it = theset.begin();
    }
    while( it != theset.end() && (newEnd == INT_MAX || it->start <= newEnd+1)) {
        newStart = min(it->start, newStart);
        newEnd = max(int->end, newEnd);
        theset.erase(it++);
    }
    theset.insert(Range(newStart,newEnd));
}

That way you will have only disjoint ranges in the set/map, which I believe is what you want, and insert takes amortized O(log N) time.

Upvotes: 1

Related Questions