windowsgm
windowsgm

Reputation: 1616

Implementing Custom Int+Range List Solution

I'm wondering if anyone can come up with a way to implement an array of numbers in a more memory efficient manner that will auto-organise itself into ranges. Example;

List testList = new List{1,2,3,4,5,6,7...};

vs

List<Range> testList = new List<Range>{1-3000,3002,4000-5000...};

Previously, I have asked a question just to confirm about whether or not this would in fact be a more memory efficient alternative. This question however pertains to actual application, how to implement this range list solution.

Index Array Storage Memory

I imagine this would perhaps need to be a custom list solution that would be a mix of ints and ranges. I'm picturing being able to .Add([int]) to the list, at which point it would determine if the value would cause a range to be added or to simply add the int value to the list.

Example

RangeList rangeList = new RangeList{1, 4, 7-9};
rangeList.Add(2);
//rangeList -> 1-2, 4, 7-9
rangeList.Add(3);
//rangeList -> 1-3, 4, 7-9

Details specific to my implementation

In my particular case, I'm analysing a very large document, line by line. Lines that meet a certain criteria need to be identified and then the overall list of line indexes need to be presented to the user.

Obviously displaying "Lines 33-32019 identified" is preferable to "Lines 33,34,35...etc". For this case, numbers will always be positive.

Upvotes: 2

Views: 328

Answers (1)

Jamiec
Jamiec

Reputation: 136239

The first thing I would do is make a class which represents your range. You can provide some convenience like formatting as a string, and having an implicit cast from an int (This helps later implementation of the range list)

public class Range
{
    public int Start{get; private set;}
    public int End{get; private set;}

    public Range(int startEnd) : this(startEnd,startEnd)
    {           
    }

     public Range(int start, int end)
     {
        this.Start = start;
        this.End = end;
     }

    public static implicit operator Range(int i)
    {
        return new Range(i);
    }

    public override string ToString()
    {
        if(Start == End)
            return Start.ToString();
        return String.Format("{0}-{1}",Start,End);
    }
}

You can then begin a simple implementation of the RangeList. By providing an Add method you can use a list initializer similar to List<T>:

public class RangeList : IEnumerable<Range>
{
    private List<Range> ranges = new List<Range>();

    public void Add(Range range)
    {
        this.ranges.Add(range);
    }

    public IEnumerator<Range> GetEnumerator()
    {
        return this.ranges.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator(){
        return this.GetEnumerator();
    }
}

At this point you can write some test code:

var rangeList = new RangeList(){
    new Range(1,10),
    15
};

foreach(var range in rangeList)
    Console.WriteLine(range);

// Outputs:
//  1-10
//  15

Live example at this point: http://rextester.com/NCZSA71850

The next thing to do is provide an overload of Add which takes an int and finds the right range or adds a new one. A naive implemntation might look like the below (Assuming the addition of an Update method on range)

public void Add(int i)
{
    // is it within or contiguous to an existing range
    foreach(var range in ranges)
    {
        if(i>=range.Start && i<=range.End)
            return; // already in a range
        if(i == range.Start-1)
        {
            range.Update(i,range.End);
            return;
        }
        if(i == range.End + 1)
        {
            range.Update(range.Start,i);
            return;
        }
    }
    // not in any ranges
    ranges.Add(i);
}

Live example at this point: http://rextester.com/CHX64125

However this suffers from a few deficiencies

  1. Does not merge ranges (say you already have 1-10 and 12-20 and you Add(11))
  2. Does not re-order so if you have 1-5 and 20-25 and Add(7) this will be at the end not in the middle.

You can solve both problems by applying a sort after each addition, and some logic to determine if you should merge ranges

private void SortAndMerge()
{
    ranges.Sort((a,b) => a.Start - b.Start);
    var i = ranges.Count-1;
    do
    {
        var start = ranges[i].Start;
        var end = ranges[i-1].End;
        if(end == start-1)
        {
            // merge and remove
            ranges[i-1].Update(ranges[i-1].Start,ranges[i].End);
            ranges.RemoveAt(i);
        }
    } while(i-- >1);
}

This needs to be called after every change to the list.

public void Add(Range range)
{
    this.ranges.Add(range);
    SortAndMerge();
}

public void Add(int value)
{
    // is it within or contiguous to an existing range
    foreach(var range in ranges)
    {
        if(value>=range.Start && value<=range.End)
            return; // already in a range
        if(value == range.Start-1)
        {
            range.Update(value,range.End);
            SortAndMerge();
            return;
        }
        if(value == range.End + 1)
        {
            range.Update(range.Start,value);
            SortAndMerge();
            return;
        }
    }
    // not in any ranges
    ranges.Add(value);
    SortAndMerge();
}

Live example here: http://rextester.com/SYLARF47057

There are still some possible edge cases with this, which I urge you to work through.

UPDATE

The below will get this working as expected. This will merge up any added ranges/ints as you would expect and returns them correctly sorted. I've only changed the Add(Range) method, I think this is a fairly clean way of doing this.

public void Add(Range rangeToAdd)
{
    var mergableRange = new List<Range>();
    foreach (var range in ranges)
    {
        if (rangeToAdd.Start == range.Start && rangeToAdd.End == range.End)
            return; // already exists

        if (mergableRange.Any())
        {
            if (rangeToAdd.End >= range.Start - 1)
            {
                mergableRange.Add(range);
                continue;
            }
        }
        else
        {
            if (rangeToAdd.Start >= range.Start - 1
                && rangeToAdd.Start <= range.End + 1)
            {
                mergableRange.Add(range);
                continue;
            }

            if (range.Start >= rangeToAdd.Start
                && range.End <= rangeToAdd.End)
            {
                mergableRange.Add(range);
                continue;
            }
        }
    }

    if (!mergableRange.Any()) //Standalone range
    {
        ranges.Add(rangeToAdd);
    }
    else //merge overlapping ranges
    {
        mergableRange.Add(rangeToAdd);
        var min = mergableRange.Min(x => x.Start);
        var max = mergableRange.Max(x => x.End);
        foreach (var range in mergableRange) ranges.Remove(range);
        ranges.Add(new Range(min, max));
    }

    SortAndMerge();
}

Finally, we need if (ranges.Count > 1) in the SortAndMerge() method to prevent an index error when the first range is added.

And with that, I think this fully satisfies my question.

Upvotes: 3

Related Questions