Reputation: 21
I'm writing a script that scrubs a document to find acronyms in the format (USA). As a processing tool I need to grab the entire sentence in which that parenthetical acronym appears. Right now my code for finding the acronym is:
With oRange.Find
.Text = "\([A-Z]{2,}\)"
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchCase = True
.MatchWildcards = True
Combining this with a Do While .Execute I can comb the doc and find the acronyms, then using a string function I take the acronym out of the parentheses and put it in a table. Is there a RegEx that I could use which would find any sentence an (USA) type acronym is in? As an input you could use this paragraph.
Thank you very much.
edit: I found the following Regex to try and make it work:
.Text = "[^.]*\([A-Z]{2,}\)[^.]*\."
But this is giving me an error, saying that the carrot can't be used in the Find function.
Upvotes: 2
Views: 392
Reputation: 3145
This regex
[^046][!^046]*\([A-Z]{2,10}\)[!^046]*[^046]
when used in the Find
dialog will find a sentence (bounded by full stops ^046).
Note that this regex returns a string with full stops on both ends, e.g.,
. A three-letter acronym (TLA) was used.
Also note that I limited acronym length to 10 chars [A-Z]{2,10}; change the upper limit as needed.
Finally I observed that this DOES NOT find acronyms at the end of a sentence, e.g.
I used a three-letter acronym (TLA).
The [!^046]*
part of the regex does not appear to match a zero length string. To catch those cases you would need to do a second pass search with this:
[^046][!^046]*\([A-Z]{2,10}\)[^046]
Hope that helps
Upvotes: 1
Reputation: 14537
I didn't managed to use that regular expression directly in the .Find
method, so using Regex directly :
Sub AcronymFinder()
Dim Para As Paragraph
Set Para = ThisDocument.Paragraphs.First
Dim ParaNext As Paragraph
Dim oRange As Range
Set oRange = Para.Range
Dim regEx As New RegExp
Dim ACrO As String
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = ".*[^.]*([\(][A-Z]{2,}[\)])[^.]*[\.]"
End With
Do While Not Para Is Nothing
Set ParaNext = Para.Next
Set oRange = Para.Range
'Debug.Print oRange.Text
If regEx.test(oRange.Text) Then
ACrO = CStr(regEx.Execute(oRange.Text)(0))
'Debug.Print ACrO
With oRange.Find
.Text = ACrO
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchCase = True
.MatchWildcards = False
.Execute
End With
Else
End If
Set Para = ParaNext
Loop
End Sub
to use it, remember to add the reference :
Description: Microsoft VBScript Regular Expressions 5.5
FullPath: C:\windows\SysWOW64\vbscript.dll\3
Major.Minor: 5.5
Name: VBScript_RegExp_55
GUID: {3F4DACA7-160D-11D2-A8E9-00104B365C9F}
Upvotes: 1