Reputation: 387
So, I'm just beginning to understand Regular Expressions and I've found the learning curve fairly steep. However, stackoverflow has been immensely helpful in the process of my experimenting. There is a particular word macro that I would like to write but I have not figured out a way to do it. I would like to be able to find two words within 10 or so words of each other in a document and then italicize those words, if the words are more than 10 words apart or are in a different order I would like the macro not to italicize those words.
I have been using the following regular expression:
\bPanama\W+(?:\w+\W+){0,10}?Canal\b
However it only lets me manipulate the entire string as a whole including random words in between. Also the .Replace function only lets me replace that string with a different string not change formatting styles.
Does any more experienced person have an idea as to how to make this work? Is it even possible to do?
EDIT: Here is what I have so far. There are two problems I am having. First I don't know how to only select the words "Panama" and "Canal" from within a matched Regular Expression and replace only those words (and not the intermediate words). Second, I just don't know how to replace a Regexp that is matched with a different format, only a different string of text - probably just as a result of a lack of familiarity with word macros.
Sub RegText()
Dim re As regExp
Dim para As Paragraph
Dim rng As Range
Set re = New regExp
re.Pattern = "\bPanama\W+(?:\w+\W+){0,10}?Canal\b"
re.IgnoreCase = True
re.Global = True
For Each para In ActiveDocument.Paragraphs
Set rng = para.Range
rng.MoveEnd unit:=wdCharacter, Count:=-1
Text$ = rng.Text + "Modified"
rng.Text = re.Replace(rng.Text, Text$)
Next para
End Sub
Ok, thanks to help from Tim Williams below I got the following solution together, it's more than a little clumsy in some respects and it is by no means pure regexp but it does get the job done. If anyone has a better solution or idea about how to go about this I'd be fascinated to hear it though. Again, my brute forcing the changes with the search and replace feature is a little embarrassingly crude but at least it works...
Sub RegText()
Dim re As regExp
Dim para As Paragraph
Dim rng As Range
Dim txt As String
Dim allmatches As MatchCollection, m As match
Set re = New regExp
re.pattern = "\bPanama\W+(?:\w+\W+){0,13}?Canal\b"
re.IgnoreCase = True
re.Global = True
For Each para In ActiveDocument.Paragraphs
txt = para.Range.Text
'any match?
If re.Test(txt) Then
'get all matches
Set allmatches = re.Execute(txt)
'look at each match and hilight corresponding range
For Each m In allmatches
Debug.Print m.Value, m.FirstIndex, m.Length
Set rng = para.Range
rng.Collapse wdCollapseStart
rng.MoveStart wdCharacter, m.FirstIndex
rng.MoveEnd wdCharacter, m.Length
rng.Font.ColorIndex = wdOrange
Next m
End If
Next para
Selection.Find.ClearFormatting
Selection.Find.Font.ColorIndex = wdOrange
Selection.Find.Replacement.ClearFormatting
Selection.Find.Replacement.Font.Italic = True
With Selection.Find
.Text = "Panama"
.Replacement.Text = "Panama"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.ClearFormatting
Selection.Find.Font.ColorIndex = wdOrange
Selection.Find.Replacement.ClearFormatting
Selection.Find.Replacement.Font.Italic = True
With Selection.Find
.Text = "Canal"
.Replacement.Text = "Canal"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.ClearFormatting
Selection.Find.Font.ColorIndex = wdOrange
Selection.Find.Replacement.ClearFormatting
Selection.Find.Replacement.Font.ColorIndex = wdBlack
With Selection.Find
.Text = ""
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
Upvotes: 6
Views: 11480
Reputation: 166126
I'm a long way off being a decent Word programmer, but this might get you started.
EDIT: updated to include a parameterized version.
Sub Tester()
HighlightIfClose ActiveDocument, "panama", "canal", wdBrightGreen
HighlightIfClose ActiveDocument, "red", "socks", wdRed
End Sub
Sub HighlightIfClose(doc As Document, word1 As String, _
word2 As String, clrIndex As WdColorIndex)
Dim re As RegExp
Dim para As Paragraph
Dim rng As Range
Dim txt As String
Dim allmatches As MatchCollection, m As match
Set re = New RegExp
re.Pattern = "\b" & word1 & "\W+(?:\w+\W+){0,10}?" _
& word2 & "\b"
re.IgnoreCase = True
re.Global = True
For Each para In ActiveDocument.Paragraphs
txt = para.Range.Text
'any match?
If re.Test(txt) Then
'get all matches
Set allmatches = re.Execute(txt)
'look at each match and hilight corresponding range
For Each m In allmatches
Debug.Print m.Value, m.FirstIndex, m.Length
Set rng = para.Range
rng.Collapse wdCollapseStart
rng.MoveStart wdCharacter, m.FirstIndex
rng.MoveEnd wdCharacter, Len(word1)
rng.HighlightColorIndex = clrIndex
Set rng = para.Range
rng.Collapse wdCollapseStart
rng.MoveStart wdCharacter, m.FirstIndex + (m.Length - Len(word2))
rng.MoveEnd wdCharacter, Len(word2)
rng.HighlightColorIndex = clrIndex
Next m
End If
Next para
End Sub
Upvotes: 6
Reputation: 916
If you're after just doing each 2 words at a time, this worked for me, following your practice lines.
foo([a-zA-Z0-9]+? ){0,10}bar
Explanation:
will grab word 1 (foo
), then match anything that is a word of alphanumeric characters ([a-zA-Z0-9]+?
) followed by a space (), 10 times (
{0,10}
), then word 2 (bar
).
This doesn't include full stops (didn't know if you wanted them), but if you want to just add .
after 0-9
in the regex.
So your (pseudocode) syntax will be similar to:
$matches = preg_match_all(); // Your function to get regex matches in an array
foreach (those matches) {
replace(KEY_WORD, <i>KEY_WORD</i>);
}
Hopefully it helps. Testing below, highlighted what it matched.
Worked:
The foo this that bar
blah
The foo economic order war bar
Didn't Work
The foo economic order. war bar
The global foo order has been around for several centuries, over this period of time people have evolved different and intricate trade relationships dealing with situations such as agriculture and bar
Upvotes: 0