Snowy
Snowy

Reputation: 6122

Fast Int Range Lookup in Multidimensional Array?

I am trying to devise a way (.NET 4.5.2) to very quickly determine if an int falls within a numeric range. Ranges do not overlap. Speed is the #1 priority for this all-in-memory operation. The below code works fine, but the actual system will have 500,000 rows sourced from a DB and I am concerned that looking for range hits in the middle of the array will incur a performance penalty. Once data is read from the DB it stays in-memory and is used as reference data in a web application.

All ideas appreciated. And thanks to https://stackoverflow.com/a/5612589/139618 for the Filter method.

// Running console app correctly shows "2288779".

static void Main( string[] args )
{
    int[,] intervals = new int[3, 3];
    intervals[0, 0] = 200;
    intervals[0, 1] = 250;
    intervals[0, 2] = 1121214;
    intervals[1, 0] = 300;
    intervals[1, 1] = 350;
    intervals[1, 2] = 2288779;
    intervals[2, 0] = 400;
    intervals[2, 1] = 450;
    intervals[2, 2] = 3300004;
    var seekIntA = 336;
    var result = Filter(intervals, u => u[0] <= seekIntA && u[1] >= seekIntA).FirstOrDefault();
    if (null != result)
    {
        Console.WriteLine(result[2]);
    }
    else
    {
        Console.WriteLine("null");
    }
}

public static IEnumerable<T[]> Filter<T>( T[,] source , Func<T[] , bool> predicate )
{
    for ( int i = 0 ; i < source.GetLength( 0 ) ; ++i )
    {
        T[] values = new T[source.GetLength( 1 )];
        for ( int j = 0 ; j < values.Length ; ++j )
        {
            values[j] = source[i , j];
        }
        if ( predicate( values ) )
        {
            yield return values;
        }
    }
}

I am open to scrapping the array idea altogether and using any other collection (lower case intentional) type to store/seek the ranges.

Thanks.

Upvotes: 1

Views: 684

Answers (1)

Assaf
Assaf

Reputation: 1370

If, as it seems, your ranges are consistent you can calculate the range in O(1) time and memory. For a more generic, albeit complicated, solution:

class Range
{
    public int min { get; private set; }
    public int max { get; private set; }

    public Range(int min, int max) {
        this.min = min;
        this.max = max;
    }
}

class MinComparer : IComparer<Range>
{
    public int Compare(Range x, Range y) {
        return (x.min - y.min);
    }
}

class MaxComparer : IComparer<Range>
{
    public int Compare(Range x, Range y) {
        return (x.max - y.max);
    }
}

class Ranges
{
    private List<Range> rangesMin;
    private List<Range> rangesMax;

    private IComparer<Range> minComparer;
    private IComparer<Range> maxComparer;

    public Ranges() {
        minComparer = new MinComparer();
        maxComparer = new MaxComparer();

        rangesMin = getRanges();
        rangesMax = new List<Range>(rangesMin);

        rangesMin.Sort(minComparer);
        rangesMax.Sort(maxComparer);
    }

    public IEnumerable<Range> getSetOfPossibleRanges(int numberToSeek) {
        Range rangeToSeek = new Range(numberToSeek, numberToSeek);
        int indexMin = rangesMin.BinarySearch(rangeToSeek, minComparer);
        int indexMax = rangesMax.BinarySearch(rangeToSeek, maxComparer);

        if(indexMin < 0) {
            indexMin = ~indexMin;
        }

        if(indexMax < 0) {
            indexMax = ~indexMax;
        }

        List<Range> subMin = rangesMin.GetRange(0, indexMin);
        List<Range> subMax = rangesMax.GetRange(indexMax, rangesMax.Count - indexMax);

        return subMin.Intersect(subMax);
    }

    private List<Range> getRanges() { //get ranges from DB here }
}

I used two lists, one sorted by the lower bound of the ranges, the other sorted by upper bound. all ranges who have the seeked number are the intersection of the subsets of those lists, where the number is greater than the lower bound in the min sorted list, and is less than the upper bound in the max sorted list.

Ranges should only be initialized at application start (does costly sorting operations on initialization).

I tested this against a solution similar to your code and found it to be much faster (tested with 1M randomized ranges).

Upvotes: 3

Related Questions