Åsa
Åsa

Reputation: 49

Separate words from numbers using VBA

I have a list of adresses that i need to clean up.

The goal is to add a space before the numbers as shown in below examples.

16AVCHARLESDAGAULLECS10525             16 Avcharlesdagaullecs 10525
1BDHIPPOLYTEMARQUES                    1 Bdhippolytemarques
20GARRICKSTREET4THFLOOR                20 Garrickstreet 4Thfloor
2109ZAC                                2109 Zac
2-4VANDRIESSTREET4                     2-4 Vandriesstreet 4
5:ETVÄRG.19.E                          5:Etvärg. 19.E
901ACEHIGHENTTOWE9TH233                901 Acehighenttowe 9Th 233

I have found a script that puts spaces in between letters but that is not my goal. Here is the code I have so far. It does not give the result abowe.

Function Add_Spaces(ByVal sText As String) As String
   Dim CharNum As Long
   Dim FixedText As String
   Dim CharCode As Long

   FixedText = Left(sText, 1)

   For CharNum = 2 To Len(sText)
      CharCode = Asc(Mid(sText, CharNum, 1))
      If CharCode >= 65 And CharCode <= 90 Then
         FixedText = FixedText & " " & Mid(sText, CharNum, 1) 'This needs to be rewritten
      Else
         FixedText = FixedText & Mid(sText, CharNum, 1)
      End If
   Next CharNum

   Add_Spaces = FixedText
End Function

Do you have any idea on how i can solve this?

Here is a solution for regex that might give a lead: R separate words from numbers in string

Upvotes: 2

Views: 150

Answers (5)

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60379

This seems to work on all of your examples:

The regular expression splits on the change from number to letter and vice versa, and also makes an exception for TH and .

Those exceptions may lead to problems in other data, but seems to work for what you present. Whether your variability is such that this will be an issue is unknown.

Note that I used early-binding for the Regex (tools/references set to include Microsoft VBScript Regular Expressions 5.5), but you can change this to late-binding if you will be distributing this code.

Edit: Note that I did NOT convert this to proper case, but that could be done if really required

`Set Reference to Microsoft VBScript Regular Expressions 5.5
Option Explicit
Function replacer(s As String) As String
    Dim RE As RegExp
    Const sPat As String = "(\dTH|\d)(?!TH)(?=[A-Z])|([A-Z.])(?=\d)"
    Const sRepl As String = "$1$2 "
    Dim sTemp As String
Set RE = New RegExp
With RE
    .Global = True
    .Pattern = sPat
    .IgnoreCase = True
    replacer = .Replace(s, sRepl)
End With

End Function

Formula would be, eg:

A2: =replacer(A2)

If PROPER case is required, then change formula to:

A2: =PROPER(replacer(A2))

this seems to work better on your data, here in the USA, than does the VBA StrConv function.

enter image description here

Upvotes: 6

FaneDuru
FaneDuru

Reputation: 42256

This function would also work on all examples. It also avoid splitting in case of 'TH' following the number:

Private Function SeparateNumbersFromString(x As String) As String
  Dim i As Long, j As Long, strInt As String, strFin As String

   For i = 1 To Len(x)
        strInt = ""
        If IsNumeric(Mid(x, i, 1)) Then
            For j = i To Len(x)
                strInt = strInt & Mid(x, j, 1)
                If Not IsNumeric(strInt) Or (Right(strInt, 1) = "-" And _
                            IsNumeric(left(strInt, Len(strInt) - 1))) Or _
                            (Right(strInt, 1) = "." And _
                               IsNumeric(left(strInt, Len(strInt) - 1))) Then
                    strFin = IIf(strFin = "", strFin, strFin & " ") & _
                                  left(strInt, Len(strInt) - 1) & _
                                   IIf(UCase(Mid(x, j, 2)) = "TH", "", " ")
                    strInt = ""
                    i = j - 1
                    Exit For
                End If
                If j >= Len(x) Then strFin = strFin & " " & strInt: GoTo Ending
            Next j
        Else
            strFin = strFin & Mid(x, i, 1)
        End If
   Next i
Ending:
   SeparateNumbersFromString = strFin
End Function

Upvotes: 1

Gary&#39;s Student
Gary&#39;s Student

Reputation: 96771

Consider:

Public Function OutString(Instring As String) As String
    Dim L As Long, i As Long, CH As String

    L = Len(Instring)
    OutString = Left(Instring, 1)

    For i = 2 To L
        CH = Mid(Instring, i, 1)
        If CH Like "[0-9]" And Not Right(OutString, 1) Like "[0-9]" Then
            OutString = OutString & " " & CH
        Else
            OutString = OutString & CH
        End If
    Next i
End Function

enter image description here

The code is similar to crawling along the string with a pointer. If the character to the right of the pointer is a numeral and the character to the left of the pointer is not a numeral, then insert a space.

Upvotes: 1

CLR
CLR

Reputation: 12279

Undoubtedly, Regex is a much tidier way of doing it - but if you wanted to alter your existing code to achieve the desired result I think this might work:

Function Add_Spaces(ByVal sText As String) As String
    Dim CharNum As Long
    Dim FixedText As String
    Dim CharCode As Long
    Dim lastCharCode As Long

    FixedText = Left(sText, 1)

    For CharNum = 2 To Len(sText)
        CharCode = Asc(Mid(sText, CharNum, 1))
        lastCharCode = Asc(Mid(sText, CharNum - 1, 1))
         If (CharCode >= 65) <> (lastCharCode >= 65) Then
            FixedText = FixedText & " " & Mid(sText, CharNum, 1) 'This needs to be rewritten
        Else
            FixedText = FixedText & Mid(sText, CharNum, 1)
        End If
    Next CharNum

    Add_Spaces = Application.WorksheetFunction.Proper(FixedText)

End Function

Upvotes: 2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522471

Try using two regex replacements on the following patterns:

([A-Za-z])([0-9])
([0-9])([A-Za-z])

and replace with $1 $2:

Dim Regex As System.Text.RegularExpressions.Regex
Dim input As String = "16AVCHARLESDAGAULLECS10525"
Dim output As String = Regex.Replace(input, "([A-Za-z])([0-9])", "$1 $2")
output = Regex.Replace(output, "([0-9])([A-Za-z])", "$1 $2")
Console.WriteLine(output)

This prints:

16 AVCHARLESDAGAULLECS 10525

The strategy here is to match every two boundary characters, in separate capture groups. A boundary here is a number followed by a letter, or vice-versa. Then, we replace with those two captured characters, with a space intercolated in between them.

Upvotes: 2

Related Questions