Reputation: 9094
If you have a set of ranges, such as the following simple example...
[
[12, 25], #1
[14, 27], #2
[15, 22], #3
[17, 21], #4
[20, 65], #5
[62, 70], #6
[64, 80] #7
]
... how do you compute the maximally intersecting subset (not sure quite how to phrase it, but I mean "the subset of ranges which intersects and has the highest cardinality") and determine the degree of intersection (cardinality of ranges in that subset)?
Logically I can work it out, and might be able to translate that to a naive algorithm. Going down the list, we see that 1-5 intersect, and 5-7 intersect, and that #5 intersects both sets.
The result I want is simply the subset, since that gives me the information about the cardinality, and I can easily compute the intersection of the set as long as they all intersect. In the above example, it would be [[14, 27],[15, 22],[12, 25],[17, 21],[20, 65]]
.
Off the top of my head, I might try converting each range to a graph node, connecting the ones which are intersecting, and finding the largest fully-connected graph.
I was also thinking iteratively to start at the beginning, continue building up a list of intersecting ranges with a running intersection on each to check against—until you hit an element which doesn't intersect, then start a new list. Continue checking each item against the existing intersections. However I'm not sure this is complete.
I could take a stab at implementing something (lang is ruby FWIW), but I would love to hear how others might solve this problem, and what the most efficient and elegant way might be.
Update:
I believe this is a specific case of the Maximum clique problem, which is NP-hard and thus actually difficult. Suggestions for approximations/real-world use would be most appreciated!
See also: http://en.wikipedia.org/wiki/Maximum_clique / Find all complete sub-graphs within a graph
Update 2
Found a nice proof of this problem's NP-hardness and NP-completeness here: http://www.cs.bris.ac.uk/~popa/ipl.pdf
Looks like this is the end of the line then. Sorry folks! I'll work with a good-enough greedy approximation. Thanks.
As said in the answers I don't think that paper describes this problem... we probably have more information here based on the ranges.
Upvotes: 7
Views: 4147
Reputation: 22977
Here is a Java implementation of the solution proposed by Dave:
public static List<Range> getMaxIntersectingSubset(List<Range> ranges) {
// First, we collect "a sorted list of the starting and ending points".
// We're sorting such that all lower bounds come first
List<Bound> intersections = Arrays.stream(ranges)
.flatMap(arr -> Stream.of(Bound.ofLowerBound(arr[0]), Bound.ofUpperBound(arr[1])))
.sorted(Comparator
.comparing(Bound::value)
.thenComparing((a, b) -> Boolean.compare(!a.isLowerBound(), b.isLowerBound())))
)
.collect(Collectors.toList());
long curOverlap = 0;
long maxOverlap = 0;
List<Integer> indexes = new ArrayList<>();
// Then we iterate the list, searching for the highest overlap
for (int i = 0; i < intersections.size(); i++) {
Bound intersection = intersections.get(i);
curOverlap += (intersection.isLowerBound() ? 1 : -1);
if (curOverlap > maxOverlap) {
maxOverlap = curOverlap;
indexes.clear();
indexes.add(i);
}
else if (curOverlap == maxOverlap) {
indexes.add(i);
}
}
return indexes.stream()
.map(index -> Range.of(intersections.get(index).value(), intersections.get(index + 1).value()))
.collect(Collectors.toList());
}
public record Bound(int value, boolean isLowerBound) {
public static Bound ofLowerBound(int value) {
return new Bound(value, true);
}
public static Bound ofUpperBound(int value) {
return new Bound(value, false);
}
}
public record Range(int start, int end) {
private Range(int start, int end) {
if (start > end) {
throw new IllegalArgumentException("start > end");
}
this.start = start;
this.end = end;
}
public static Range of(int start, int end) {
return new Range(start, end);
}
}
These are some test sets:
From this Stackoverflow question
int[][] ints = {
{ 1, 100 },
{ 30, 95 },
{ 10, 60 },
{ 15, 25 },
{ 33, 66 },
{ 20, 50 },
{ 51, 100 },
{ 25, 70 }
};
According to OP of abovementioned question, the result should be [ 30, 55 ]
. The provided code outputs this result. However, there's another overlap, although less wide. See pictures below.
OP of this question mentions this list:
int[][] ints = {
{ 12, 25 },
{ 14, 27 },
{ 15, 22 },
{ 17, 21 },
{ 20, 65 },
{ 62, 70 },
{ 64, 80 }
};
Here, the output is [ 20, 21 ]
. This matches the below diagram:
Upvotes: 4
Reputation: 9075
If I understand the problem correctly, it is not an instance of the NP problem described in the paper you linked to. Here is my understanding of the problem, and a polynomial-time solution.
We are given a finite set of ranges of real numbers, say n: [A1, B1], [A2, B2], ..., [An, Bn], where Ai ≤ Bi.
Create a sorted list of the starting and ending points, ordered numerically, indicating whether the point is a starting or ending point.
In your example, this would be: 12+, 14+, 15+, 17+, 20+, 21-, 22-, 25-, 27-, 62+, 64+, 65-, 70-, 80-
Initialize curOverlap
and maxOverlap
to zero.
Iterate through the list, incrementing curOverlap
for each + and decrementing it for each -. Set maxOverlap = max(curOverlap, maxOverlap)
on each increment.
To continue your example:
val, cur, max
12, 1, 1
14, 2, 2
15, 3, 3
17, 4, 4
20, 5, 5
21, 4, 5
22, 3, 5
25, 2, 5
27, 1, 5
62, 2, 5
64, 3, 5
65, 2, 5
70, 1, 5
80, 0, 5
The max overlap is 5. You could also store the val associated with the max if you wanted to know where the max overlap occurred. That would give you 20. in this example. It's then trivial to go through the initial set of ranges and find the 5 which include 20.
-edit- If you have repeated values, count the plusses before the minuses for each value so that you include ranges that overlap at a single point.
Upvotes: 17