Reputation: 41
Trying to separate millions of street names from their suffixes ("Ave", "Dr", "St", etc.) in Excel. Sometimes streets do not have a suffix, so an empty cell is needed for those.
I tried using a formula at first, but ran into some issues:
Listed the suffixes with the highlighted example formula below and then using CONCAT in the last column to get them all in one column , but sometimes it also grabs directionals ("S", "N", "E", "W") after the suffix,
lists two suffixes with the same letters (e.g. "CI", "CIR"),
or grabs the street name if it includes the same letters as a suffix (see "ST" column).
I've also tried Flash Fill, but this yields worse results.
I came to the conclusion VBA would be most efficient because I will have to repeat this process for multiple datasets, but if there are other suggestions I am open.
I am very novice in VBA, but this is what I have so far and I'm not sure how to expand/continue it:
Sub getSuffix()
Dim i As Range
For Each i In Range("A1")
Do
If InStr(ActiveCell.Value, " AVE") > 0 Then
ActiveCell.Offset(0, 1).Value = "AVE"
ActiveCell.Offset(1, 0).Select
Else
If IsEmpty(ActiveCell.Value) Then
Exit For
End If
End If
Loop
Next i
End Sub
Upvotes: 1
Views: 385
Reputation: 15357
AFAICT the rules are as follows:
Ave
, Dr
, St
,Cir
)I would suggest using regular expressions for this.
Add a reference (Tools -> References...) to the Microsoft VBScript Regular Expressions 5.5 library.
Then you can write code like the following:
Option Explicit
Dim re As RegExp
Function GetSuffix(address As String) As String
If re Is Nothing Then
Set re = New RegExp
re.IgnoreCase = True
re.Pattern = " (Ave|Dr|St|Cir)( |$)"
End If
Dim matches As MatchCollection
Set matches = re.Execute(address)
If matches.Count > 0 Then GetSuffix = matches(0).SubMatches(0)
End Function
and use the function in a cell formula:
=GetSuffix(A4)
It will return the first matched suffix, excluding any subsequent characters.
You probably will want to add more suffixes, e.g.:
re.Pattern = " (Ave|Dr|St|Cir|Wy|Ci|Pl|Aisle)( |$)"
Links:
Upvotes: 2
Reputation: 42236
Try the next code, please. It should be fast enough, using arrays, working only in memory and dropping the processing result at once:
Sub SplitBySuffix()
Dim sh As Worksheet, lastR As Long, i As Long
Dim arr, arrStr, arrS, arrSuff, arrInt, boolFound As Boolean
arrSuff = Split("AVE,PL,LN,ST,DR,WY", ",") 'put here all the necessary suffixes (Upper case)
'"AVE" is different than ".AVE"...
Set sh = ActiveSheet 'use here your necessary sheet
lastR = sh.Range("A" & sh.rows.count).End(xlUp).row 'last row of A:A
arr = sh.Range("A2:A" & lastR).value 'put the range to be processed in an array
ReDim arrStr(1 To UBound(arr), 1 To 1) 'redim the street array
ReDim arrS(1 To UBound(arr), 1 To 1) 'redim the suffix array
For i = 1 To UBound(arr) 'iterate between the array to be processed elements
arrInt = Split(arr(i, 1), " ") 'split the array element by space (" ")
'check if the last array element is a suffix:
If Not IsError(Application.match(UCase(arrInt(UBound(arrInt))), arrSuff, 0)) Then
arrS(i, 1) = arrInt(UBound(arrInt)): boolFound = True
End If
If boolFound Then 'if a suffix has been found:
'eliminate the last array element_________________________
arrInt(UBound(arrInt)) = arrInt(UBound(arrInt)) & "##$$@@"
arrInt = Filter(arrInt, arrInt(UBound(arrInt)), False)
'_________________________________________________________
arrStr(i, 1) = Join(arrInt, " ") 'put in the street array the joined filter result
boolFound = False 'reinitialize the boolean variable
Else
arrStr(i, 1) = arr(i, 1) 'put in the street array the element as it is
End If
Next i
'drop the arrays content at once:
sh.Range("B2").Resize(lastR - 1, 1) = arrStr
sh.Range("C2").Resize(lastR - 1, 1) = arrS
End Sub
Upvotes: 1