Splitting apart overlapping segments

Question

Here is a vector of segments

class Segment
{
public:
   size_t left;
   size_t right;
   char ID;
   Segment(size_t a, size_t b, char c):left(a), right(b), ID(c){assert(left A = {{3, 10, 'A'}, {7, 22, 'B'}, {14, 17, 'C'} , {16, 19, 'D'}, {25, 31, 'E'}, {28, 32, 'F'}, {34, 37, 'G'}, {34, 37, 'H'}, {46, 49, 'I'}, {52, 59, 'J'}}

The vector A is sorted based on the attribute left. One could draw the content of this vector as

       -------  'A'
           ---------------  'B'
                  ---  'C'
                    ---   'D' 
                             ------  'E'
                                ----   'F'
                                      ---  'G'
                                      ---  'H'
                                                  ---  'I'
                                                        -------  'J'

I would like to create object B that contains small non-overlapping segments from those big segments in A. To get to B, we must shrink the segments that overlap with one another and create a new segment with ID X for all the places that overlap. Vector B also needs to be sorted based on the left attribute.

For the above example, the expected output is

std::vector B = {{3, 7, 'A'}, {7, 10, `X`}, {10, 14, 'B'}, {14, 19, 'X'}, {19, 22, 'B'} , {25, 28, 'E'}, {28, 31, 'X'}, {31, 32, 'F'}, {34, 37, 'X'}, {46, 49, 'I'}, {52, 59, 'J'}}


       ----  'A'
           ---  'X'  (overlap between 'A' and 'B')
              ----  'B'
                  --------  'X' (overlap between 'B', 'C' and 'D')
                          ---  'B'  -> Note that 'B' is now split in two
                                ---  'E'
                                   ---   'X'  (overlap between 'E' and 'F')
                                      -  'F'
                                         ---  'X' (overlap between 'G' and 'H')
                                                     ---  'I'
                                                           -------  'J'

Can anyone give me a hand?

Note that, unlike in the above example, two segments in A can actually have the same ID (but then, it would be impossible for them to overlap). Note also that A is not const and can be modified during the operation. For performance consideration, note that the vector A is typically relatively short; between 1 and about a hundred (or a few hundreds) segments long. The values left and right are typically quite large (in the range 0 to about 1e9) and only few of those segments will intersect. Generally, when there are few segments, those segments will be quite wide (when size is 1, the single segment will often be about 1e9 in width). Finally, you can print the above diagram with

void print(std::vector& v)
{
    for (auto& elem : v)
    {
        std::cout << "{" << elem.left << ", " << elem.right << ", " << elem.ID << "} ";
    }
    std::cout << "
";

    for (auto& elem : v)
    {
        for (size_t i = 0 ; i < elem.left ; ++i)
            std::cout << " ";
        for (size_t i = 0 ; i < elem.right - elem.left ; ++i)
            std::cout << "-";
        std::cout << "  " << elem.ID << "
";           
    }
    std::cout << "


";  
}

An algorithm that does not need the input to be sorted would be even better.

Attempt

Just to show effort, here is an attempt that 1) is buggy and 2) would be a relatively slow implementation. Let's call "breakpoint", any right of left in the following B vector. The idea is to jumps from one breakpoint to the next by systematically searching among previous and following segments for a potential next breakpoint. In doing so, it should keep track what ID (if the distance between breakpoint matches at least one segment in A) should be given in the new segment.

std::vector foo(std::vector& A)
{
    if (A.size() <= 1) return A;

    std::vector B;
    B.reserve(A.size());
    
    size_t A_index = 0;
    size_t currentPos = A[A_index].left;
    while ( A_index < A.size())
    {
        auto nextPos = A[A_index].right;

        //std::cout << "currentPos = " << currentPos << "
";
        //std::cout << "nextPos before search = " << nextPos << "
";

        bool isIntersection = false;
        // Search in preceding Segments
        for (size_t i = A_index - 1 ; i < A.size() ; --i)
        {
            if (A[i].right > currentPos && A[i].right < nextPos )
            {
                nextPos = A[i].right;
                isIntersection = true;
                //std::cout << "Found " << nextPos << " in preceding segment
";
            }
        }

        // Search in following Segments
        for (size_t i = A_index+1 ; i < A.size() ; ++i)
        {
            if ( A[i].left > currentPos && A[i].left < nextPos)
            {
                nextPos = A[i].left;
                //std::cout << "Found left of " << nextPos << " in following segment
";
                break;
            }

            if ( A[i].right > currentPos &&  A[i].right < nextPos )
            {
                nextPos = A[i].right;
                isIntersection = true;
                //std::cout << "Found right of " << nextPos << " in following segment
";
                break;
            }
        }

        // create new Segment
        if (!isIntersection)
        {
            B.push_back({currentPos, nextPos, A[A_index].ID});
        } else
        {
            B.push_back({currentPos, nextPos, 'X'});
        }
        if (nextPos == A[A_index].right)
        {
            ++A_index;
            nextPos = A[A_index].left;
        }
        currentPos = nextPos;
    }

    return B;
}


int main()
{
    std::vector A = {{3, 10, 'A'}, {7, 22, 'B'}, {14, 17, 'C'} , {16, 19, 'D'}, {25, 31, 'E'}, {28, 32, 'F'}, {34, 37, 'G'}, {34, 37, 'H'}, {46, 49, 'I'}, {52, 59, 'J'}};
    print(A);
    auto B = foo(A);
    print(B);
}

cigien · Accepted Answer

Here's a solution that computes all the transition points created by segments, and then reconstructs the new segments using these points.

The algorithm is:

Every segment generates 2 transition points, one for the opening, and one for the closing of the segment.
The transition points are sorted.
Construct new segments from every adjacent pair of transition points. Each pair of points represents either:

a) an empty segment (no new segment is added)

b) a single segment (a segment with .ID is added)

c) multiple segments (a segment with 'X' is added)
Newly constructed segments might contain adjacent X segments, so they need to be merged.

First, a simple struct to store the transition points:

struct Point
{
    size_t location; 
    bool overlap;    // does this point start/close a new segment
    char ID;
};

The implementation is:

std::vector foo(std::vector const & segments)
{
    // generate all transition points
    std::vector points;
    for (auto const & seg : segments)
    {
        points.push_back({seg.left, true, seg.ID});
        points.push_back({seg.right, false, seg.ID});
    }

    // sort transition points
    std::sort(points.begin(), points.end(), 
      [](auto a, auto b) { return a.location < b.location; });

    std::vector res;

    // initialize overlaps
    std::multiset overs{points[0].ID};

    // for every adjacent transition point
    for(auto i = 1u; i < points.size(); ++i) 
    {
        auto &a = points[i - 1];
        auto &b = points[i];

        // if there is a jump in between transition points
        if (a.location < b.location)
           switch (overs.size())
           {
               // no segment
               case 0 : break;
               // ony one segment
               case 1 : res.push_back({a.location, b.location, *overs.begin()}); break;
               // overlapping segment
               default : res.push_back({a.location, b.location, 'X'}); break;
           }

        // update overlaps
        if (b.overlap)
           overs.insert(b.ID);
        else
           overs.erase(overs.find(b.ID));  
    }
    
    // merge adjacent 'X' overlaps 
    for(auto i = 0u; i < res.size(); ++i) 
    {
         if (res[i].ID == 'X')
         {
           auto f = std::find_if(res.begin() + i + 1, res.end(),
            [](auto r) { return r.ID != 'X'; });
           res[i].right = (f - 1)->right;
           res.erase(res.begin() + i + 1, f); 
         }
     }
        
    return res;
}

This is an O(n log(n)) algorithm.

Here's a demo.

Splitting apart overlapping segments

Answers (2)

Related Questions