Melipao
Melipao

Reputation: 31

How to reduce an array of ranges to a minimal set of ranges using Linq

I have an array of alphanumeric postal codes that could look something like this:

PostalCodes(0) = "AA000-DD130"
PostalCodes(1) = "DD131-DD150"
PostalCodes(2) = "DD151-EE180"
PostalCodes(3) = "EE300-EE600"
PostalCodes(4) = "EE450-EE700"
PostalCodes(5) = "EE800"
PostalCodes(6) = "EE810"
PostalCodes(7) = "EE811"
PostalCodes(8) = "EE812"
PostalCodes(9) = "EE813"
PostalCodes(10) = "EE814"
PostalCodes(11) = "EE815"

And I want it to optimize to something like this:

PostalCodes(0) = "AA000-EE180"
PostalCodes(1) = "EE300-EE700"
PostalCodes(2) = "EE800"
PostalCodes(3) = "EE810-EE815"

As you can see ranges can overlap or there might be gaps, its ok, I only want to optimize(reduce) the postal codes as much as I can. I already have a code using For loops but I would like to know if there is a way to use Linq to do this task faster and improve performance?

I'm using vb.net.

Thanks in advance.

Upvotes: 0

Views: 81

Answers (2)

Enigmativity
Enigmativity

Reputation: 117124

This pretty much does it:

Dim PostalCodes(11) As String
PostalCodes(0) = "AA000-DD130"
PostalCodes(1) = "DD131-DD150"
PostalCodes(2) = "DD151-EE180"
PostalCodes(3) = "EE300-EE600"
PostalCodes(4) = "EE450-EE700"
PostalCodes(5) = "EE800"
PostalCodes(6) = "EE810"
PostalCodes(7) = "EE811"
PostalCodes(8) = "EE812"
PostalCodes(9) = "EE813"
PostalCodes(10) = "EE814"
PostalCodes(11) = "EE815"

Dim splits = _ 
    PostalCodes _
        .Select(Function (x) If(x.Contains("-"), x.Split("-"c), { x, x })) _
        .Select(Function (ps) ps.Select(Function (p) New With _
        { _
            .Prefix = p.Substring(0, 2), _
            .Value = Integer.Parse(p.Substring(2)) _
        }).ToArray()) _
        .ToArray()

Dim results = _
    splits _
        .Skip(1) _
        .Aggregate( _
            splits.Take(1).ToList(), _
            Function (a, x)
                Dim l = a.Last()
                If x(0).Prefix = l(1).Prefix AndAlso x(0).Value <= l(1).Value + 1 Then
                    a.RemoveAt(a.Count - 1)
                    a.Add( _
                    { _
                        New With _
                        { _
                            .Prefix = l(0).Prefix, _
                            .Value = l(0).Value _
                        }, _
                        New With _
                        { _
                            .Prefix = l(1).Prefix, _
                            .Value = x(1).Value _
                        } _                     
                    })
                Else
                    a.Add(x)
                End If
                Return a
            End Function) _
        .Select(Function (xs) String.Format("{0}{1:000}-{2}{3:000}", xs(0).Prefix, xs(0).Value, xs(1).Prefix, xs(1).Value)) _
        .ToArray()

It gives:

AA000-DD180 
EE300-EE700 
EE800-EE800 
EE810-EE815 

A small bit of work remaining to get rid of the double "EE800-EE800".

Upvotes: 1

NetMage
NetMage

Reputation: 26927

Using an extension method to convert ranges into tuples with StartPrefix, Start, EndPrefix, End values:

<Extension()>
Public Function RangeToTuple(ByVal strRange As String) As (StartPrefix As String, Start As Integer, EndPrefix As String, [End] As Integer
    If strRange.Contains("-") Then
        Dim twoRanges = strRange.Split("-")
        Dim intRanges = twoRanges.Select(Function(rs) rs.Substring(2).ToInteger()).ToList()
        Return (twoRanges(0).Substring(0, 2), intRanges(0), twoRanges(1).Substring(0, 2), intRanges(1))
    Else
        Dim intRange = strRange.Substring(2).ToInteger()
        Return (strRange.Substring(0, 2), intRange, strRange.Substring(0, 2), intRange)
    End If
End Function

Which uses a simple extension for converting String to Integer:

<Extension()>
Public Function ToInteger(s As String) As Integer
    Return Convert.ToInt32(s)
End Function

Using a custom extension method that is a variation of the APL scan operator that works like Aggregate but returns the intermediate values, and uses a ValueTuple to return the original values with the intermediate calculations:

<Extension()>
Public Iterator Function ScanPair(Of T, TKey)(src As IEnumerable(Of T), seedKey As TKey, combine As Func(Of (Key As TKey, Value As T), T, TKey)) As IEnumerable(Of (Key As TKey, Value As T))
    Using srce = src.GetEnumerator()
        If srce.MoveNext() Then
            Dim prevkv = (seedKey, srce.Current)

            While srce.MoveNext()
                Yield prevkv
                prevkv = (combine(prevkv, srce.Current), srce.Current)
            End While
            Yield prevkv
        End If
    End Using
End Function

And an extension method that groups based on a predicate:

<Extension()>
Public Function GroupByWhile(Of T, TRes)(src As IEnumerable(Of T), test As Func(Of T, T, Boolean), result As Func(Of T, TRes)) As IEnumerable(Of IGrouping(Of Integer, TRes))
    Return src.ScanPair(1, Function(kvp, cur) If(test(kvp.Value, cur), kvp.Key, kvp.Key + 1)) _
              .GroupBy(Function(kvp) kvp.Key, Function(kvp) result(kvp.Value))
End Function

<Extension()>
Public Function GroupByWhile(Of T)(src As IEnumerable(Of T), test As Func(Of T, T, Boolean)) As IEnumerable(Of IGrouping(Of Integer, T))
    Return src.GroupByWhile(test, Function(e) e)
End Function

Then you can use LINQ to process the tuples:

Dim combinedRanges = input.Select(Function(rs) rs.RangeToTuple()) _
                          .GroupByWhile(Function(prev, cur) prev.EndPrefix = cur.StartPrefix And cur.Start <= prev.End+1 And prev.End <= cur.End) _
                          .Select(Function(kvpg) (kvpg.First().StartPrefix, kvpg.First().Start, kvpg.Last().EndPrefix, kvpg.Last().End)) _
                          .Select(Function(tp) If(tp.Start = tp.End, $"{tp.StartPrefix}{tp.Start}", _
                                                                     $"{tp.StartPrefix}{tp.Start}-{tp.EndPrefix}{tp.End}"))

NOTE: After writing this answer, I realized I could generalize my sequence grouping constructs to group on a predicate, and make it more accessible.

Converted to VB per comment.

Upvotes: 0

Related Questions