b. austen
b. austen

Reputation: 151

algorithm challenge: merging date range

I'm facing an interesting problem:

Is it possible to "des-overlap" theses ranges? That is, to generate:

Maybe I can make this a bit more graphical. This is what I have first:

a   |------------------------------|
b                    |-------------------|
c          |-----------------|

This is what I would like to obtain:

    |------|---------|-------|-----|-----|
        a      a,c     a,b,c   a,b    b

I found a solution that kind of works, but which is not elegant:

  1. I transform each range (from, to) into a list of days (d1, d2, d3, etc.)
  2. I group names by day
  3. I aggregate groups that contain the same names to recreate ranges

Can you think of a better solution? I'm working with C# but any language independent idea would be much appreciated. Thanks!

Upvotes: 15

Views: 3372

Answers (6)

Elaine
Elaine

Reputation: 1318

  1. I have a list first, i don't know its length, but i got 3 chars
  2. check for one slot, if A there? add 'A', if B there? add 'B', if c there? add 'C'
  3. go to another slot, cycle like #2
  4. end when nothing added to another new slot
  5. I got the list
  6. flatten the list

Upvotes: 0

Justin L.
Justin L.

Reputation: 13600

I would

  1. Keep an unordered list of "open" ranges
  2. Start from day 1, and add the first range to the open ranges list.
  3. Move until the next range boundary (be it start or close). Create your first "output" range: starting from day 1, ending on that day. In it are the items in your open ranges list.
  4. If the range encountered is already in the open ranges list, remove it. Otherwise, add it.
  5. Repeat 3 and 4, moving along the line.

I definitely did a bad job of explaining it. I'll be writing some code for this soon. But until then, here's what would happen in your solution:

a   |------------------------------|
b                    |-------------------|
c          |-----------------|
1.  Start at day 1, add A to open ranges list, which now contains [A]
2.  Move to the start of C.  First RESULT RANGE: A range from Day 1 to C's start,
      with a value A. (what is in your open ranges list)
3.  Add C to the open ranges list, which now contains [A,C]
4.  Move to the start of B.  Second RESULT RANGE: A range from C's start to B's
      start, with a value A,C. (what is in your open ranges list)
5.  Add B to the open ranges list, which now contains [A,C,B]
6.  Move to the end of C.  Third RESULT RANGE: A range from B's start to C's end,
      with a value of A,C,B.
7.  Remove C from the open ranges list, which now contains [A,B]
8.  Move to the end of A.  Fourth RESULT RANGE: A range from C's end to A's end,
      with a value of A,B
9.  Remove A from the open ranges list, which now contains [B]
10. Move to the end of A.  Fourth RESULT RANGE: A range from A's end to B's end,
      with a value of B

RESULT RANGES
1. from Day 1 to C's start: A
2. from C's start to B's start: A,C
3. from B's start to C's end: A,C,B
4. from C's end to A's end: A,B
5. from A's end to B's end: B

Alternative Method

You could do this:

  1. Keep a ordered list of "output ranges". This makes it easy to test if a point is within a range, and also what ranges follow eachother.
  2. Take a range from your input ranges.
  3. Check to see if it is completely before or completely after all output ranges, and handle accordingly. Skip the next steps and go back to step 2, if so.
  4. Compare its start point to the output ranges
  5. If it is before any other output range, add a new output range from its start to the start of the first output range.
  6. If this is in between an already-existing output range, split that output range at that point. The first part would hold the same "parents", and the second part would have the same parents + the new input range.
  7. Now, compare its end point to the output ranges.
  8. If it is after any other output range, add a new output range from the end of the last output range to its end.
  9. If this is in between an already-existing output range, split that output range at that point. The second part would hold the same "parents", and the first part would have the same parents + the new input range
  10. Add the current input range as a part to all ranges in between the two "processed" ranges in steps 6 and 9, if any.
  11. Repeat 2-6 for all input ranges.

Here are the first few steps, using the sample data + another range D: ("processed" ranges indicated by **double stars**)

a   |------------------------------|
b                    |-------------------|
c          |-----------------|
d       |------------------------|

1.
   Process A
   Output ranges: |--------------A---------------|
2.
   Process B
     - Start of B is in **A**; split A in two:
                  |-------A-------|------AB------|
     - End of B is after any output range range;
         creating new output range at end
                  |-------A-------|------AB------|---B---|
    - Nothing was/is in between **A** and (nothing)
3.
   Process C
     - Start of C is in **A**; split A in two:
                  |---A----|--AC--|------AB------|---B---|
     - End of C is in **AB**; split AB in two:
                  |---A----|--AC--|--ABC--|--AB--|---B---|
     - There were/are no ranges between **A** and **AB**

4.
   Process D
     - Start of D is in **A**; split A in two:
                  |-A-|-AD-|--AC--|--ABC--|--AB--|---B---|
     - End of D is in **AB**; split AB in two:
                  |-A-|-AD-|--AC--|--ABC--|ABD|AB|---B---|
     - Ranges AC and ABC were/are in between **A** and **AB**
                  |-A-|-AD-|--ACD-|-ABCD--|ABD|AB|---B---|

Final output:     |-A-|-AD-|--ACD-|-ABCD--|ABD|AB|---B---|

Upvotes: 10

Peter Alexander
Peter Alexander

Reputation: 54270

Do this:

Create an Event class.

class DateEvent : IComparable<DateEvent>
{
    public Date Date;
    public int DateRangeId;
    public bool IsBegin; // is this the start of a range?

    public int CompareTo(DateEvent other)
    {
        if (Date < other.Date) return -1;
        if (Date > other.Date) return +1;
        if (IsBegin && !other.IsBegin) return -1;
        if (!IsBegin && other.IsBegin) return +1;
        return 0;
    }
}

Define a List<DateEvent> events;

Add each range's start and end date into the list:

for (int i = 0; i < dateRanges.Length; ++i)
{
    DateRange r = dateRanges[i];
    events.Add(new DateEvent(r.BeginDate, i, true));
    events.Add(new DateEvent(r.EndDate, i, false));
}

Sort the events.

events.Sort();

Now set up an output class:

class OutputDateRange
{
    public Date BeginDate;
    public Date EndDate;
    public List<string> Names;
}

Finally, traverse the events, maintaining which ranges are present:

List<int> activeRanges;
List<OutputDateRange> output;
Date current = events[0].Date;
int i = 0;

while (i < events.Length;)
{
    OutputDateRange out = new OutputDateRange();
    out.BeginDate = current;

    // Find ending date for this sub-range.
    while (i < events.Length && events[i].Date == current)
    {
        out.EndDate = events[i].Date;
        if (events[i].IsBegin)
            activeRanges.Add(events[i].DateRangeId);
        else
            activeRanges.Remove(events[i].DateRangeId);
        ++i;
    }

    if (i < events.Length)
        current = events[i].Date;

    foreach (int j in activeRanges)
        out.Names.Add(dateRanges[j].Name);

    output.Add(out);
}

That should do the trick. Note that I've not made the constructors, and the code is a bit ugly, but hopefully that conveys the general idea.

Upvotes: 0

Lasse V. Karlsen
Lasse V. Karlsen

Reputation: 391306

I have code that does this. It relies on the input-set being ordered by from and then to (ie. if it was SQL, it would be ORDER BY from_value, to_value), but after that it is quite optimal.

My implementation basically does what @Justin L.'s answer describes, so if you just want a textual description, look at his answer for the algorithm.

The code is here: LVK.DataStructures, and the files you want to look at are:

To use it:

List<Range<DateTime>> ranges = ...
var slices = ranges.Slice();

This will give you one new range for each slice, and for each such range you would have a .Data property containing references back to the contributing ranges.

ie. for your original example, you would get exactly what you want, individual ranges, with the contributing ranges a, b, c etc. in the .Data property.

The classes might use other portions of my class library, which is all there. If you decide to use it, just copy out the portions you want to use.

If you're only interested in the implementation, it can be found in the RangeExtensions.cs file, line 447 and onwards, InternalSlice method.

Upvotes: 2

Frank
Frank

Reputation: 2648

You could:

  1. sort the list of all dates (combining the from and to dates)
  2. then running along this list, each new range would be from one date to the next start or end date that is different from the preceding.

For naming the new ranges, it would make sense to have the list of original range names that the current new range contains, and each time you hit an end date, remove the old range name from the list, and each tiem you hit a start date, add it's name to the list.

Upvotes: 1

sfussenegger
sfussenegger

Reputation: 36096

You may want to have look at Interval Trees.

Upvotes: 2

Related Questions