Remove Duplicate Words from input string. Use VBA Regular Expressions

Question

Language: VBA Environment: Excel 2007 Tool: RegEx object

Hi! I'm trying to remove duplicate words from input string that represent an address. I receive an excel worksheet column with somehow combined address pieces. It is not in Cyrillic, but if to represent in English it'll look something like this:

125424, RepeatedName, RepeatedName, and some words, 75
194044, Repeated-dashedName, Repeated-dashedName, other Uniques, 3
300911, Normal non-repeated, names, dashed and non-Dashed, 123

The text is case-insensitive and can contain numbers, punctuation marks, spaces. It is known that duplicate words can only appear one exactly after another, there will not be other "exclusive" word among duplicate instances except comas and spaces. I need to remove duplicate dashed and non-dashed words and keep only single instance if each repeated word. I need to keep the only instance of "RepeatedName" same as "Repeated-dashedName". So the ideal result will look like this:

125424, RepeatedName, and some words, 75
194044, Repeated-dashedName, other Uniques, 3
300911, Normal non-repeated, names, dashed and non-Dashed, 123

To solve this I've tried different variants of code, but working one is escaping from me. My best guess is this:

Option Explicit
Dim strIn As String, strPattern As String, strReplace As String, strResult as String
dim regex As Object

strIn = fnGetInputString()
strPattern = ".*\b((\w+)\b.*\1).*"
strReplace = "$1"

If regex Is Nothing Then Set regex = New RegExp

With regex
  .MultiLine = False
  .Global = True
  .IgnoreCase = True
  .Pattern = strPattern
End With

strResult = regex.Replace(strIn, strReplace)

but as a result, my strResult consequently gave me only this:

75
3
123

So I haven't managed to correctly capture and reuse repeated group in the regex. Any help will be appreciated.

I'm new to regex but have read some documentation, articles, discussions and StackOverflow questions, but have found no working answer.

Remove Duplicate Words from input string. Use VBA Regular Expressions

Answers (1)

Related Questions