Reputation: 18219
Here is a vector of segments
class Segment
{
public:
size_t left;
size_t right;
char ID;
Segment(size_t a, size_t b, char c):left(a), right(b), ID(c){assert(left<right);}
};
std::vector<Segment> A = {{3, 10, 'A'}, {7, 22, 'B'}, {14, 17, 'C'} , {16, 19, 'D'}, {25, 31, 'E'}, {28, 32, 'F'}, {34, 37, 'G'}, {34, 37, 'H'}, {46, 49, 'I'}, {52, 59, 'J'}}
The vector A
is sorted based on the attribute left
. One could draw the content of this vector as
------- 'A'
--------------- 'B'
--- 'C'
--- 'D'
------ 'E'
---- 'F'
--- 'G'
--- 'H'
--- 'I'
------- 'J'
I would like to create object B
that contains small non-overlapping segments from those big segments in A
. To get to B
, we must shrink the segments that overlap with one another and create a new segment with ID X
for all the places that overlap. Vector B
also needs to be sorted based on the left
attribute.
For the above example, the expected output is
std::vector<Segment> B = {{3, 7, 'A'}, {7, 10, `X`}, {10, 14, 'B'}, {14, 19, 'X'}, {19, 22, 'B'} , {25, 28, 'E'}, {28, 31, 'X'}, {31, 32, 'F'}, {34, 37, 'X'}, {46, 49, 'I'}, {52, 59, 'J'}}
---- 'A'
--- 'X' (overlap between 'A' and 'B')
---- 'B'
-------- 'X' (overlap between 'B', 'C' and 'D')
--- 'B' -> Note that 'B' is now split in two
--- 'E'
--- 'X' (overlap between 'E' and 'F')
- 'F'
--- 'X' (overlap between 'G' and 'H')
--- 'I'
------- 'J'
Can anyone give me a hand?
Note that, unlike in the above example, two segments in A
can actually have the same ID (but then, it would be impossible for them to overlap). Note also that A
is not const
and can be modified during the operation. For performance consideration, note that the vector A
is typically relatively short; between 1 and about a hundred (or a few hundreds) segments long. The values left
and right
are typically quite large (in the range 0 to about 1e9) and only few of those segments will intersect. Generally, when there are few segments, those segments will be quite wide (when size is 1, the single segment will often be about 1e9 in width). Finally, you can print the above diagram with
void print(std::vector<Segment>& v)
{
for (auto& elem : v)
{
std::cout << "{" << elem.left << ", " << elem.right << ", " << elem.ID << "} ";
}
std::cout << "\n";
for (auto& elem : v)
{
for (size_t i = 0 ; i < elem.left ; ++i)
std::cout << " ";
for (size_t i = 0 ; i < elem.right - elem.left ; ++i)
std::cout << "-";
std::cout << " " << elem.ID << "\n";
}
std::cout << "\n\n\n";
}
An algorithm that does not need the input to be sorted would be even better.
Attempt
Just to show effort, here is an attempt that 1) is buggy and 2) would be a relatively slow implementation. Let's call "breakpoint", any right of left in the following B
vector. The idea is to jumps from one breakpoint to the next by systematically searching among previous and following segments for a potential next breakpoint. In doing so, it should keep track what ID (if the distance between breakpoint matches at least one segment in A
) should be given in the new segment.
std::vector<Segment> foo(std::vector<Segment>& A)
{
if (A.size() <= 1) return A;
std::vector<Segment> B;
B.reserve(A.size());
size_t A_index = 0;
size_t currentPos = A[A_index].left;
while ( A_index < A.size())
{
auto nextPos = A[A_index].right;
//std::cout << "currentPos = " << currentPos << "\n";
//std::cout << "nextPos before search = " << nextPos << "\n";
bool isIntersection = false;
// Search in preceding Segments
for (size_t i = A_index - 1 ; i < A.size() ; --i)
{
if (A[i].right > currentPos && A[i].right < nextPos )
{
nextPos = A[i].right;
isIntersection = true;
//std::cout << "Found " << nextPos << " in preceding segment\n";
}
}
// Search in following Segments
for (size_t i = A_index+1 ; i < A.size() ; ++i)
{
if ( A[i].left > currentPos && A[i].left < nextPos)
{
nextPos = A[i].left;
//std::cout << "Found left of " << nextPos << " in following segment\n";
break;
}
if ( A[i].right > currentPos && A[i].right < nextPos )
{
nextPos = A[i].right;
isIntersection = true;
//std::cout << "Found right of " << nextPos << " in following segment\n";
break;
}
}
// create new Segment
if (!isIntersection)
{
B.push_back({currentPos, nextPos, A[A_index].ID});
} else
{
B.push_back({currentPos, nextPos, 'X'});
}
if (nextPos == A[A_index].right)
{
++A_index;
nextPos = A[A_index].left;
}
currentPos = nextPos;
}
return B;
}
int main()
{
std::vector<Segment> A = {{3, 10, 'A'}, {7, 22, 'B'}, {14, 17, 'C'} , {16, 19, 'D'}, {25, 31, 'E'}, {28, 32, 'F'}, {34, 37, 'G'}, {34, 37, 'H'}, {46, 49, 'I'}, {52, 59, 'J'}};
print(A);
auto B = foo(A);
print(B);
}
Upvotes: 4
Views: 544
Reputation: 60208
Here's a solution that computes all the transition points created by segments, and then reconstructs the new segments using these points.
The algorithm is:
Every segment generates 2 transition points, one for the opening, and one for the closing of the segment.
The transition points are sorted.
Construct new segments from every adjacent pair of transition points. Each pair of points represents either:
a) an empty segment (no new segment is added)
b) a single segment (a segment with .ID is added)
c) multiple segments (a segment with 'X' is added)
Newly constructed segments might contain adjacent X
segments, so they need to be merged.
First, a simple struct to store the transition points:
struct Point
{
size_t location;
bool overlap; // does this point start/close a new segment
char ID;
};
The implementation is:
std::vector<Segment> foo(std::vector<Segment> const & segments)
{
// generate all transition points
std::vector<Point> points;
for (auto const & seg : segments)
{
points.push_back({seg.left, true, seg.ID});
points.push_back({seg.right, false, seg.ID});
}
// sort transition points
std::sort(points.begin(), points.end(),
[](auto a, auto b) { return a.location < b.location; });
std::vector<Segment> res;
// initialize overlaps
std::multiset<char> overs{points[0].ID};
// for every adjacent transition point
for(auto i = 1u; i < points.size(); ++i)
{
auto &a = points[i - 1];
auto &b = points[i];
// if there is a jump in between transition points
if (a.location < b.location)
switch (overs.size())
{
// no segment
case 0 : break;
// ony one segment
case 1 : res.push_back({a.location, b.location, *overs.begin()}); break;
// overlapping segment
default : res.push_back({a.location, b.location, 'X'}); break;
}
// update overlaps
if (b.overlap)
overs.insert(b.ID);
else
overs.erase(overs.find(b.ID));
}
// merge adjacent 'X' overlaps
for(auto i = 0u; i < res.size(); ++i)
{
if (res[i].ID == 'X')
{
auto f = std::find_if(res.begin() + i + 1, res.end(),
[](auto r) { return r.ID != 'X'; });
res[i].right = (f - 1)->right;
res.erase(res.begin() + i + 1, f);
}
}
return res;
}
This is an O(n log(n))
algorithm.
Here's a demo.
Upvotes: 1
Reputation: 122133
The following is not the most efficient, but it produces the expected output. Strategy is the following:
For convenience I am using a
struct Overlap {
std::vector<char> IDs;
Overlap() {}
void add(char id) {IDs.push_back(id);}
};
And my implementation of the last step requires to drop the requirement of left<right
in the constructor.
int main() {
std::vector<Segment> A = {{3, 10, 'A'}, {7, 22, 'B'}, {14, 17, 'C'} , {16, 19, 'D'}, {25, 31, 'E'}, {28, 32, 'F'}, {34, 37, 'G'}, {34, 37, 'H'}, {46, 49, 'I'}, {52, 59, 'J'}};
// dissect
std::map<size_t,Overlap> over;
for (const auto& s : A) {
for (size_t i = s.left; i < s.right; ++i) {
over[i].add(s.ID);
}
}
// assign new segments
std::map<size_t,char> pieces;
for (const auto& o : over) {
if (o.second.IDs.size() == 1) {
pieces[o.first] = o.second.IDs.front();
} else {
pieces[o.first] = 'X';
}
}
// glue them
std::vector<Segment> result;
auto it = pieces.begin();
Segment current(it->first,it->first,it->second); // here left==right !
++it;
for ( ; it != pieces.end(); ++it) {
if (it->second == current.ID) continue;
current.right = it->first -1;
result.push_back(current);
current = Segment{it->first,it->first,it->second};
}
print(result);
}
Output:
{3, 6, A} {7, 9, X} {10, 13, B} {14, 18, X} {19, 24, B} {25, 27, E} {28, 30, X} {31, 33, F} {34, 45, X} {46, 51, I}
--- A
-- X
--- B
---- X
----- B
-- E
-- X
-- F
----------- X
----- I
Actually the only expensive part is 1 (O(number of segments x their width). I believe it can be made more efficient by using two containers, one sorted according to left
and one according to right
, to enable easier detection of overlaps. However, the above naive implementation is what I would start with. Also if the width is considerably small compared to number of segments then it might be that O(number of segments x width) is better than O( log number of segments x number of segments) for sorting.
Upvotes: 2