johnwangel
johnwangel

Reputation: 47

Flag special characters in document using VBA in Word

I am at a complete loss for a better way of handling this process.

The following macro analyzes every character in a document, and if the ASCII value is higher than 255, it applies a special character style to it - some for specific languages, or just 'lang' if it is not part of those languages.

The macro works fine, but on long documents, it takes a PAINFULLY long time to process. For example, I just processed a 147 page (single-spaced) document with a few lines of Greek on each page, and it took 40 minutes, in Word 2016 for Windows (by contrast, the exact same file and same code took 2 minutes on Mac).

Is there anything I could do to the code below to optimize this for Windows?

Thanks for any suggestions. John

    Sub CheckSpecialCharacters()
    'This macro looks for any characters above 255 and tags them with the appropriate existing language character.

        Dim ch As Range: Set ch = ActiveDocument.Characters(1)

        Do

            Counter = Counter + 1

            ch.Select

            myValue = AscW(Selection.Text)
            If myValue > 255 Then

                If (myValue > 8190 And myValue < 8225) Or (myValue > 288 And myValue < 381) Or (myValue > 701 And myValue < 704) Or myValue = 730 Then
                    'Ignores Curly Quotes and Transliteration punctuation

                ElseIf (myValue > 7935 And myValue < 8192) Or (myValue > 879 And myValue < 1024) Then
                    'Greek Characters get langgrk applied
                    Selection.Expand unit:=wdWord
                    Selection.Style = "langgrk"

                ElseIf (myValue > 1423 And myValue < 1535) Then
                    'Hebrew Characters get langheb applied
                    Selection.Expand unit:=wdWord
                    Selection.Style = "langheb"

                ElseIf myValue > 7679 And myValue < 7830 Then
                    'Extended transliteration characters get langtrans applied //OLD VALUES// (myValue > 288 And myValue < 381) Or (myValue > 701 And myValue < 704)
                    If HCCP = True Then Selection.Expand unit:=wdWord
                    Selection.Style = "langtrans"

                ElseIf (myValue > 19968 And myValue < 40959) Then
                    'Chinese Characters get langchin applied
                    Selection.Expand unit:=wdWord
                    Selection.Style = "langchin"

                ElseIf (myValue > 19968 And myValue < 40917) Then
                    'Japanese Characters get langjap applied
                    Selection.Expand unit:=wdWord
                    Selection.Style = "langjap"

                Else
                    If HCCP = True Then Selection.Expand unit:=wdWord
                    Selection.Style = "lang"

                End If

            End If

DoNext:


End Sub

Upvotes: 0

Views: 1799

Answers (1)

Slai
Slai

Reputation: 22876

For some reason Range.DetectLanguage doesn't seem to work on my version of Word (2007), but that might be something to look into instead of checking the character code.

The general approach to speeding up Office VBA macros is to disable the screen updating:

Application.ScreenUpdating = False
' some slow code that causes the screen to be updated
Application.ScreenUpdating = True 

That should help a bit in your case because you are using the slower Selection instead of Range.

Also, checking the byte values directly seems a bit faster than AscW:

Sub test()
    'Options.DefaultHighlightColorIndex = wdNoHighlight
    'Range.HighlightColorIndex = wdNoHighlight ' used for testing to clear Highlight

    Dim r As Range, t As Double: t = Timer
    Application.ScreenUpdating = False

    For Each r In Range.Characters ' For Each r In Range.Words is somehow about 2 times slower than .Characters
        checkRange r
    Next

    Application.ScreenUpdating = True
    Debug.Print Timer - t; Range.Words.Count; Range.Characters.Count; Range.End ' " 3.15625  8801  20601  20601 "
End Sub

Sub checkRange(r As Range)
    Dim b() As Byte, i As Long, a As Long
    b = r.Text ' converts the string to byte array (2 or 4 bytes per character)
    'Debug.Print "'" & r & "'"; r.LanguageID; r.LanguageIDFarEast; r.LanguageIDOther

    For i = 1 To UBound(b) Step 2            ' 2 bytes per Unicode codepoint
        If b(i) > 0 Then                     ' if AscW > 255
            a = b(i): a = a * 256 + b(i - 1) ' AscW
            Select Case a
                Case &H1F00 To &H1FFF: r.HighlightColorIndex = wdBlue: Exit Sub ' Greek Extended
                Case &H3040 To &H30FF: r.HighlightColorIndex = wdPink: Exit Sub ' Hiragana and Katakana
                Case &H4E00 To 40959: r.HighlightColorIndex = wdGreen: Exit Sub ' CJK Unified Ideographs

                Case 55296 To 56319: ' ignore leading High Surrogates ?
                Case 56320 To 57343: ' ignore trailing Low Surrogates ?

                Case Else: r.HighlightColorIndex = wdRed: Debug.Print Hex(a), r.End - r.Start ' other
            End Select
        End If
    Next
End Sub

Few of the Unicode code points in your code like 8190 seem a bit off, so you can check them at http://www.fileformat.info/info/unicode/block/index.htm

Upvotes: 1

Related Questions