Remi.b
Remi.b

Reputation: 18219

Splitting apart overlapping segments

Here is a vector of segments

class Segment
{
public:
   size_t left;
   size_t right;
   char ID;
   Segment(size_t a, size_t b, char c):left(a), right(b), ID(c){assert(left<right);}
};


std::vector<Segment> A = {{3, 10, 'A'}, {7, 22, 'B'}, {14, 17, 'C'} , {16, 19, 'D'}, {25, 31, 'E'}, {28, 32, 'F'}, {34, 37, 'G'}, {34, 37, 'H'}, {46, 49, 'I'}, {52, 59, 'J'}}

The vector A is sorted based on the attribute left. One could draw the content of this vector as

       -------  'A'
           ---------------  'B'
                  ---  'C'
                    ---   'D' 
                             ------  'E'
                                ----   'F'
                                      ---  'G'
                                      ---  'H'
                                                  ---  'I'
                                                        -------  'J'

I would like to create object B that contains small non-overlapping segments from those big segments in A. To get to B, we must shrink the segments that overlap with one another and create a new segment with ID X for all the places that overlap. Vector B also needs to be sorted based on the left attribute.

For the above example, the expected output is

std::vector<Segment> B = {{3, 7, 'A'}, {7, 10, `X`}, {10, 14, 'B'}, {14, 19, 'X'}, {19, 22, 'B'} , {25, 28, 'E'}, {28, 31, 'X'}, {31, 32, 'F'}, {34, 37, 'X'}, {46, 49, 'I'}, {52, 59, 'J'}}


       ----  'A'
           ---  'X'  (overlap between 'A' and 'B')
              ----  'B'
                  --------  'X' (overlap between 'B', 'C' and 'D')
                          ---  'B'  -> Note that 'B' is now split in two
                                ---  'E'
                                   ---   'X'  (overlap between 'E' and 'F')
                                      -  'F'
                                         ---  'X' (overlap between 'G' and 'H')
                                                     ---  'I'
                                                           -------  'J'

Can anyone give me a hand?

Note that, unlike in the above example, two segments in A can actually have the same ID (but then, it would be impossible for them to overlap). Note also that A is not const and can be modified during the operation. For performance consideration, note that the vector A is typically relatively short; between 1 and about a hundred (or a few hundreds) segments long. The values left and right are typically quite large (in the range 0 to about 1e9) and only few of those segments will intersect. Generally, when there are few segments, those segments will be quite wide (when size is 1, the single segment will often be about 1e9 in width). Finally, you can print the above diagram with

void print(std::vector<Segment>& v)
{
    for (auto& elem : v)
    {
        std::cout << "{" << elem.left << ", " << elem.right << ", " << elem.ID << "} ";
    }
    std::cout << "\n";

    for (auto& elem : v)
    {
        for (size_t i = 0 ; i < elem.left ; ++i)
            std::cout << " ";
        for (size_t i = 0 ; i < elem.right - elem.left ; ++i)
            std::cout << "-";
        std::cout << "  " << elem.ID << "\n";           
    }
    std::cout << "\n\n\n";  
}

An algorithm that does not need the input to be sorted would be even better.

Attempt

Just to show effort, here is an attempt that 1) is buggy and 2) would be a relatively slow implementation. Let's call "breakpoint", any right of left in the following B vector. The idea is to jumps from one breakpoint to the next by systematically searching among previous and following segments for a potential next breakpoint. In doing so, it should keep track what ID (if the distance between breakpoint matches at least one segment in A) should be given in the new segment.

std::vector<Segment> foo(std::vector<Segment>& A)
{
    if (A.size() <= 1) return A;

    std::vector<Segment> B;
    B.reserve(A.size());
    
    size_t A_index = 0;
    size_t currentPos = A[A_index].left;
    while ( A_index < A.size())
    {
        auto nextPos = A[A_index].right;

        //std::cout << "currentPos = " << currentPos << "\n";
        //std::cout << "nextPos before search = " << nextPos << "\n";

        bool isIntersection = false;
        // Search in preceding Segments
        for (size_t i = A_index - 1 ; i < A.size() ; --i)
        {
            if (A[i].right > currentPos && A[i].right < nextPos )
            {
                nextPos = A[i].right;
                isIntersection = true;
                //std::cout << "Found " << nextPos << " in preceding segment\n";
            }
        }

        // Search in following Segments
        for (size_t i = A_index+1 ; i < A.size() ; ++i)
        {
            if ( A[i].left > currentPos && A[i].left < nextPos)
            {
                nextPos = A[i].left;
                //std::cout << "Found left of " << nextPos << " in following segment\n";
                break;
            }

            if ( A[i].right > currentPos &&  A[i].right < nextPos )
            {
                nextPos = A[i].right;
                isIntersection = true;
                //std::cout << "Found right of " << nextPos << " in following segment\n";
                break;
            }
        }

        // create new Segment
        if (!isIntersection)
        {
            B.push_back({currentPos, nextPos, A[A_index].ID});
        } else
        {
            B.push_back({currentPos, nextPos, 'X'});
        }
        if (nextPos == A[A_index].right)
        {
            ++A_index;
            nextPos = A[A_index].left;
        }
        currentPos = nextPos;
    }

    return B;
}


int main()
{
    std::vector<Segment> A = {{3, 10, 'A'}, {7, 22, 'B'}, {14, 17, 'C'} , {16, 19, 'D'}, {25, 31, 'E'}, {28, 32, 'F'}, {34, 37, 'G'}, {34, 37, 'H'}, {46, 49, 'I'}, {52, 59, 'J'}};
    print(A);
    auto B = foo(A);
    print(B);
}

Upvotes: 4

Views: 544

Answers (2)

cigien
cigien

Reputation: 60208

Here's a solution that computes all the transition points created by segments, and then reconstructs the new segments using these points.

The algorithm is:

  1. Every segment generates 2 transition points, one for the opening, and one for the closing of the segment.

  2. The transition points are sorted.

  3. Construct new segments from every adjacent pair of transition points. Each pair of points represents either:

    a) an empty segment (no new segment is added)

    b) a single segment (a segment with .ID is added)

    c) multiple segments (a segment with 'X' is added)

  4. Newly constructed segments might contain adjacent X segments, so they need to be merged.

First, a simple struct to store the transition points:

struct Point
{
    size_t location; 
    bool overlap;    // does this point start/close a new segment
    char ID;
};

The implementation is:

std::vector<Segment> foo(std::vector<Segment> const & segments)
{
    // generate all transition points
    std::vector<Point> points;
    for (auto const & seg : segments)
    {
        points.push_back({seg.left, true, seg.ID});
        points.push_back({seg.right, false, seg.ID});
    }

    // sort transition points
    std::sort(points.begin(), points.end(), 
      [](auto a, auto b) { return a.location < b.location; });

    std::vector<Segment> res;

    // initialize overlaps
    std::multiset<char> overs{points[0].ID};

    // for every adjacent transition point
    for(auto i = 1u; i < points.size(); ++i) 
    {
        auto &a = points[i - 1];
        auto &b = points[i];

        // if there is a jump in between transition points
        if (a.location < b.location)
           switch (overs.size())
           {
               // no segment
               case 0 : break;
               // ony one segment
               case 1 : res.push_back({a.location, b.location, *overs.begin()}); break;
               // overlapping segment
               default : res.push_back({a.location, b.location, 'X'}); break;
           }

        // update overlaps
        if (b.overlap)
           overs.insert(b.ID);
        else
           overs.erase(overs.find(b.ID));  
    }
    
    // merge adjacent 'X' overlaps 
    for(auto i = 0u; i < res.size(); ++i) 
    {
         if (res[i].ID == 'X')
         {
           auto f = std::find_if(res.begin() + i + 1, res.end(),
            [](auto r) { return r.ID != 'X'; });
           res[i].right = (f - 1)->right;
           res.erase(res.begin() + i + 1, f); 
         }
     }
        
    return res;
}

This is an O(n log(n)) algorithm.

Here's a demo.

Upvotes: 1

463035818_is_not_an_ai
463035818_is_not_an_ai

Reputation: 122133

The following is not the most efficient, but it produces the expected output. Strategy is the following:

  1. (It actually does not require the input vector to be sorted)
  2. Dissect all Segments into 1 wide pieces. Store them in a map that maps the position to the IDs present at that position
  3. Assign new IDs according to overlap
  4. Glue the pieces again together

For convenience I am using a

struct Overlap {
    std::vector<char> IDs;
    Overlap() {}
    void add(char id) {IDs.push_back(id);}
};

And my implementation of the last step requires to drop the requirement of left<right in the constructor.

int main() {
    std::vector<Segment> A = {{3, 10, 'A'}, {7, 22, 'B'}, {14, 17, 'C'} , {16, 19, 'D'}, {25, 31, 'E'}, {28, 32, 'F'}, {34, 37, 'G'}, {34, 37, 'H'}, {46, 49, 'I'}, {52, 59, 'J'}};   
    // dissect
    std::map<size_t,Overlap> over;
    for (const auto& s : A) {
        for (size_t i = s.left; i < s.right; ++i) {
            over[i].add(s.ID);
        }
    }
    // assign new segments 
    std::map<size_t,char> pieces;
    for (const auto& o : over) {
        if (o.second.IDs.size() == 1) {
            pieces[o.first] = o.second.IDs.front();
        } else {
            pieces[o.first] = 'X';
        }
    }
    // glue them
    std::vector<Segment> result;
    auto it = pieces.begin();
    Segment current(it->first,it->first,it->second);  // here left==right !
    ++it;
    for ( ; it != pieces.end(); ++it) {
        if (it->second == current.ID) continue;
        current.right = it->first -1;
        result.push_back(current);
        current = Segment{it->first,it->first,it->second};
    }

    print(result);
}

Output:

{3, 6, A} {7, 9, X} {10, 13, B} {14, 18, X} {19, 24, B} {25, 27, E} {28, 30, X} {31, 33, F} {34, 45, X} {46, 51, I} 
   ---  A
       --  X
          ---  B
              ----  X
                   -----  B
                         --  E
                            --  X
                               --  F
                                  -----------  X
                                              -----  I

Live Example

Actually the only expensive part is 1 (O(number of segments x their width). I believe it can be made more efficient by using two containers, one sorted according to left and one according to right, to enable easier detection of overlaps. However, the above naive implementation is what I would start with. Also if the width is considerably small compared to number of segments then it might be that O(number of segments x width) is better than O( log number of segments x number of segments) for sorting.

Upvotes: 2

Related Questions