Alex
Alex

Reputation: 631

Create clean URL from text in Excel

I want to create a clean URL from a text such as this one:

Alpha Tests' Purchase of Berta Global Associates (C)

The URL should look like this:

alpha-tests-purchase-of-berta-global-associates-c

Currently I use this formula in Excel:

=LOWER(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A38;"--";"-");" / ";"-");"  ";"-");": ";"-");" - ";"-");"_";"-");"?";"");",";"");".";"");"'";"");")";"");"(";"");":";"");" ";"-");"&";"and");"!";"");"/";"-");"""";""))

However, I don't seem to catch all special symbols etc. and as a consequence my URLs are not as clean as I want them to be.

Do you know an Excel formula or VBA code, which ensures that all special symbols are properly converted to a clean URL?

Thank you.

Upvotes: 1

Views: 1549

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626903

I can suggest the following Function that you can put into a VBA module and use a normal formula:

Function NormalizeToUrl(cell As Range)

Dim strPattern As String
Dim regEx As Object

Set regEx = CreateObject("vbscript.regexp")
strPattern = "[^\w-]+"

With regEx
    .Global = True
    .Pattern = strPattern
End With

NormalizeToUrl = LCase(regEx.Replace(Replace(cell.Value, " ", "-"), ""))
End Function

enter image description here

The point is that we replace all spaces with hyphens at the beginning, then use a regex that matches any non-word and non-hyphen characters and remove them with RegExp.Replace.

UPDATE:

After your comments, it is still unclear what you want to do with Unicode letters. Delete or replace with hyphen. Here is a function that I tried to rebuild from your formula, but the logics may be flawed. I would prefer a generic approach above.

Function NormalizeToUrl(cell As Range)

Dim strPattern As String
Dim regEx As Object

Set regEx = CreateObject("vbscript.regexp")
strPattern = "[^\w -]"

With regEx
    .Global = True
    .Pattern = "[?,.')(:!""]+" ' THESE ARE REMOVED
End With

NormalizeToUrl = regEx.Replace(cell.Value, "")
NormalizeToUrl = Replace(NormalizeToUrl, "&", "and") ' & TURNS INTO "and"

With regEx
    .Global = True
    .Pattern = strPattern ' WE REPLACE ALL NON-WORD CHARS WITH HYPHEN
End With
NormalizeToUrl = LCase(regEx.Replace(Replace(NormalizeToUrl, " ", "-"), "-"))
With regEx
    .Global = True
    .Pattern = "--+" ' WE SHRINK ALL HYPHEN SEQUENCES TO SINGLE HYPHEN
End With
NormalizeToUrl = regEx.Replace(NormalizeToUrl, "-")
End Function

Upvotes: 3

Related Questions