Reputation: 631
I want to create a clean URL from a text such as this one:
Alpha Tests' Purchase of Berta Global Associates (C)
The URL should look like this:
alpha-tests-purchase-of-berta-global-associates-c
Currently I use this formula in Excel:
=LOWER(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A38;"--";"-");" / ";"-");" ";"-");": ";"-");" - ";"-");"_";"-");"?";"");",";"");".";"");"'";"");")";"");"(";"");":";"");" ";"-");"&";"and");"!";"");"/";"-");"""";""))
However, I don't seem to catch all special symbols etc. and as a consequence my URLs are not as clean as I want them to be.
Do you know an Excel formula or VBA code, which ensures that all special symbols are properly converted to a clean URL?
Thank you.
Upvotes: 1
Views: 1549
Reputation: 626903
I can suggest the following Function that you can put into a VBA module and use a normal formula:
Function NormalizeToUrl(cell As Range)
Dim strPattern As String
Dim regEx As Object
Set regEx = CreateObject("vbscript.regexp")
strPattern = "[^\w-]+"
With regEx
.Global = True
.Pattern = strPattern
End With
NormalizeToUrl = LCase(regEx.Replace(Replace(cell.Value, " ", "-"), ""))
End Function
The point is that we replace all spaces with hyphens at the beginning, then use a regex that matches any non-word and non-hyphen characters and remove them with RegExp.Replace
.
UPDATE:
After your comments, it is still unclear what you want to do with Unicode letters. Delete or replace with hyphen. Here is a function that I tried to rebuild from your formula, but the logics may be flawed. I would prefer a generic approach above.
Function NormalizeToUrl(cell As Range)
Dim strPattern As String
Dim regEx As Object
Set regEx = CreateObject("vbscript.regexp")
strPattern = "[^\w -]"
With regEx
.Global = True
.Pattern = "[?,.')(:!""]+" ' THESE ARE REMOVED
End With
NormalizeToUrl = regEx.Replace(cell.Value, "")
NormalizeToUrl = Replace(NormalizeToUrl, "&", "and") ' & TURNS INTO "and"
With regEx
.Global = True
.Pattern = strPattern ' WE REPLACE ALL NON-WORD CHARS WITH HYPHEN
End With
NormalizeToUrl = LCase(regEx.Replace(Replace(NormalizeToUrl, " ", "-"), "-"))
With regEx
.Global = True
.Pattern = "--+" ' WE SHRINK ALL HYPHEN SEQUENCES TO SINGLE HYPHEN
End With
NormalizeToUrl = regEx.Replace(NormalizeToUrl, "-")
End Function
Upvotes: 3