bathtubandatoaster
bathtubandatoaster

Reputation: 67

Manipulate string to extract address

I'm currently doing some work with a very large data source on city addresses where the data looks something like this.

137 is the correct address but it belongs in a building that takes up 135-138A on the street.

source:

137 9/F 135-138A KING STREET 135-138A KING STREET TOR

i've used a function which removes the duplicates shown on extendoffice.

the second column has become this:

137 9/F 135-138A KING STREET TOR

what I want to do now is

9/F 137 KING STREET TOR

Would the the best way to accomplish this?

The main problem I'm having with this is there are many inconsistent spaces in address names ex. "van dyke rd".

Is there anyway I can locate in an array the "-" and set variables for the 2 numbers on either side of the dash and replace it with the correct address number located at the front

Function RemoveDupes2(txt As String, Optional delim As String = " ") As String
Dim x

With CreateObject("Scripting.Dictionary")
    .CompareMode = vbTextCompare
    For Each x In Split(txt, delim)
        If Trim(x) <> "" And Not .exists(Trim(x)) Then .Add Trim(x), Nothing
    Next
    If .Count > 0 Then RemoveDupes2 = Join(.keys, delim)
End With
End Function

Thanks

Upvotes: 2

Views: 137

Answers (3)

user3598756
user3598756

Reputation: 29421

I'd just:

  • swap 1st and 2nd substrings

  • erase the substring with "-" in it

    Function RemoveDupes2(txt As String, Optional delim As String = " ") As String
        Dim x As Variant, arr As Variant, temp As Variant
        Dim iArr As Long
    
        With CreateObject("Scripting.Dictionary")
            .CompareMode = vbTextCompare
            For Each x In Split(txt, delim)
                If Trim(x) <> "" And Not .exists(Trim(x)) Then .Add Trim(x), Nothing
            Next
            If .count > 0 Then
                arr = .keys
                temp = arr(0)
                arr(0) = arr(1)
                arr(1) = temp
                For iArr = LBound(arr) To UBound(arr)
                    If InStr(arr(iArr), "-") <> 0 Then arr(iArr) = ""
                Next
                RemoveDupes2 = Join(arr, delim)
            End If
        End With
    End Function
    

Upvotes: 0

brettdj
brettdj

Reputation: 55682

This meets the case you want, it captures the address range as two separate matches (if you want to process further).

The current code simple removes this range altogether.

What logic is there to move the 9/F to front?

See regex here

Function StripString(strIn As String) As String
 Dim objRegex As Object
 Set objRegex = CreateObject("vbscript.regexp")
 With objRegex
       .Pattern = "(\d+[A-C]?)-(\d+[A-C]?)"
        If .test(strIn) Then
            StripString = .Replace(strIn, vbullstring)
        Else
           StripString = "No match"
        End If
    End With
End Function

Upvotes: 1

Jemma
Jemma

Reputation: 112

Regular Expressions are a way to (amongst other things) search for a feature in a string.

It looks like the feature you are looking for is: number:maybe some spaces : dash : maybe some spaces : number

In regex notation this would be expressed as: ([0-9]*)[ ]*-[ ]*([0-9]*) Which translates to: Find a sequential group of digits followed by zero or more spaces, then a dash, then zero or more spaces, then some more digits.

The parenthesis indicate the elements that will be returned. So you could assign variables to the be the first number or the second number.

You might need to tweak this if a dash can potentially occur elsewhere in the address.

Further information on actually implementing that is available here: How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

Upvotes: 1

Related Questions