Nate
Nate

Reputation: 21

Creating a RegEx to find a sentence with a parenthetical acronym (VBasic Word)

I'm writing a script that scrubs a document to find acronyms in the format (USA). As a processing tool I need to grab the entire sentence in which that parenthetical acronym appears. Right now my code for finding the acronym is:

With oRange.Find
        .Text = "\([A-Z]{2,}\)"
        .Forward = True
        .Wrap = wdFindStop
        .Format = False
        .MatchCase = True
        .MatchWildcards = True

Combining this with a Do While .Execute I can comb the doc and find the acronyms, then using a string function I take the acronym out of the parentheses and put it in a table. Is there a RegEx that I could use which would find any sentence an (USA) type acronym is in? As an input you could use this paragraph.

Thank you very much.

edit: I found the following Regex to try and make it work:

.Text = "[^.]*\([A-Z]{2,}\)[^.]*\."

But this is giving me an error, saying that the carrot can't be used in the Find function.

Upvotes: 2

Views: 392

Answers (2)

xidgel
xidgel

Reputation: 3145

This regex

[^046][!^046]*\([A-Z]{2,10}\)[!^046]*[^046]

when used in the Find dialog will find a sentence (bounded by full stops ^046).

Note that this regex returns a string with full stops on both ends, e.g.,

. A three-letter acronym (TLA) was used.

Also note that I limited acronym length to 10 chars [A-Z]{2,10}; change the upper limit as needed.

Finally I observed that this DOES NOT find acronyms at the end of a sentence, e.g.

I used a three-letter acronym (TLA).

The [!^046]* part of the regex does not appear to match a zero length string. To catch those cases you would need to do a second pass search with this:

[^046][!^046]*\([A-Z]{2,10}\)[^046]

Hope that helps

Upvotes: 1

R3uK
R3uK

Reputation: 14537

I didn't managed to use that regular expression directly in the .Find method, so using Regex directly :

Sub AcronymFinder()
    Dim Para As Paragraph
    Set Para = ThisDocument.Paragraphs.First
    Dim ParaNext As Paragraph
    Dim oRange As Range
    Set oRange = Para.Range
    Dim regEx As New RegExp
    Dim ACrO As String

    With regEx
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = ".*[^.]*([\(][A-Z]{2,}[\)])[^.]*[\.]"
    End With

    Do While Not Para Is Nothing
        Set ParaNext = Para.Next
        Set oRange = Para.Range
        'Debug.Print oRange.Text
        If regEx.test(oRange.Text) Then
            ACrO = CStr(regEx.Execute(oRange.Text)(0))
            'Debug.Print ACrO
            With oRange.Find
                .Text = ACrO
                .Forward = True
                .Wrap = wdFindStop
                .Format = False
                .MatchCase = True
                .MatchWildcards = False
                .Execute
            End With
        Else
        End If
        Set Para = ParaNext
    Loop
End Sub

to use it, remember to add the reference :

Description: Microsoft VBScript Regular Expressions 5.5
FullPath: C:\windows\SysWOW64\vbscript.dll\3
Major.Minor: 5.5
Name: VBScript_RegExp_55
GUID: {3F4DACA7-160D-11D2-A8E9-00104B365C9F}

Upvotes: 1

Related Questions