Reputation: 95
I have a string in an Excel cell. I need to remove certain words from the string, but only if they are separated by white space. Here is an example:
I have: hello include in the formula for attachment a dog
I want to delete "in", "for" and "a", but only as entire words, not when they are a part of another word (for example "include", "formula", "attachment".)
I should end up with: hello include the formula attachment dog
I tried:
Function REMOVETEXTS(strInput As String, rngFind As Range) As String
Dim strTemp As String
Dim strFind As String
strTemp = strInput
For Each cell In rngFind
strFind = cell.Value
strTemp = Replace(strTemp, strFind, "" "", , , 1)
Next cell
REMOVETEXTS = strTemp
But it removes, for example, "in" from "include". Any advice?
Upvotes: 0
Views: 1045
Reputation: 3259
My function splits up your original string by spaces " "
, and then compares each word of the original string against the words in the other specified range and does not include them in the result if they match.
A1
has the string you start with, and A2:A4
have all of the words that you want to exclude ("in", "for", "a"
).
If you want to have multiple delimiters beyond just spaces, you can just add them to the arguments as one big string. In the example, it would split words by any of the characters in the string " ,="
, a space, a comma, or an equals.
Option Explicit
Sub Test()
MsgBox RemoveWholeWords(Range("A1").Value2, Range("A2:A4"), " ,=")
End Sub
Function RemoveWholeWords(ByVal str As String, ByVal remove As Range, ByVal delimiters) As String
Dim i As Long
If Len(delimiters) > 1 Then
For i = 1 To Len(delimiters)
str = Replace(str, Mid(delimiters, i, 1), Right(delimiters, 1))
Next i
End If
Dim words() As String
words = Split(str, Right(delimiters, 1))
Dim removed As Boolean
For i = LBound(words) To UBound(words)
removed = False
Dim cel As Range
For Each cel In remove
If words(i) = cel.Value2 Then removed = True
Next cel
If Not removed And Len(words(i)) > 0 Then
RemoveWholeWords = Trim$(RemoveWholeWords & " " & words(i))
End If
Next i
End Function
Upvotes: 1
Reputation: 2968
This should work:
Function REMOVETEXTS(strInput As String, rngFind As Range) As String
Dim strTemp As String
strTemp = strInput
Dim cell As Range
Dim strFind As String
For t = 1 To 10 'The For Each loop has to run multiple times in cases the same search-word occurs back to back.
For Each cell In rngFind
strFind = cell.Value
strTemp = Trim(Replace(" " & strTemp & " ", " " & strFind & " ", " "))
Next cell
Next t
REMOVETEXTS = strTemp
End Function
Upvotes: 2
Reputation: 84465
The following uses a regex to do the removal. It has an additional regex pattern at the end to get rid of excessive white space left behind.
The list of words to remove is passed as a comma separated list without white space. Inside the regex function this is converted to a pattern of, in this case, \b(in|for|a)\b
. This is essentially an OR list of individual words i.e. either "in", "for" or "a"
.
There is an optional 3rd parameter that allows you to choose whether to ignore the case of the matched words. It defaults to False
.
You can use it in the sheet as an UDF.
VBA:
Option Explicit
Public Sub test()
Dim j As Long, arr()
arr = Array("hello include in the formula for attachment a dog")
For j = LBound(arr) To UBound(arr)
Debug.Print RemoveUnWantedStrings(arr(j), "in,for,a", False)
Next
End Sub
Public Function RemoveUnWantedStrings(ByVal inputString As String, ByVal sPattern As String, Optional ignoreCase As Boolean = False) As Variant
Dim matches As Object, iMatch As Object
With CreateObject("vbscript.regexp")
.Global = True
.MultiLine = True
.ignoreCase = ignoreCase
.Pattern = "\b(" & Replace$(sPattern, ",", "|") & ")\b"
If .test(inputString) Then
' Set matches = .Execute(inputString)
'For Each iMatch In matches
inputString = .Replace(inputString, vbNullString)
' Next iMatch
Else
RemoveUnWantedStrings = inputString
Exit Function
End If
.Pattern = "\s{2,}"
RemoveUnWantedStrings = .Replace(inputString, Chr$(32))
End With
End Function
In sheet UDF:
Regex: Try it here.
/
\b(in|for|a)\b
/
gm
\b
assert position at a word boundary (^\w|\w$|\W\w|\w\W)
1st Capturing Group (in|for|a)
1st Alternative in
in
matches the characters in
literally (case sensitive)
2nd Alternative for
for
matches the characters for
literally (case sensitive)
3rd Alternative a
a
matches the character a
literally (case sensitive)
\b
assert position at a word boundary (^\w|\w$|\W\w|\w\W)
Upvotes: 1