Reputation: 47
I am at a complete loss for a better way of handling this process.
The following macro analyzes every character in a document, and if the ASCII value is higher than 255, it applies a special character style to it - some for specific languages, or just 'lang' if it is not part of those languages.
The macro works fine, but on long documents, it takes a PAINFULLY long time to process. For example, I just processed a 147 page (single-spaced) document with a few lines of Greek on each page, and it took 40 minutes, in Word 2016 for Windows (by contrast, the exact same file and same code took 2 minutes on Mac).
Is there anything I could do to the code below to optimize this for Windows?
Thanks for any suggestions. John
Sub CheckSpecialCharacters()
'This macro looks for any characters above 255 and tags them with the appropriate existing language character.
Dim ch As Range: Set ch = ActiveDocument.Characters(1)
Do
Counter = Counter + 1
ch.Select
myValue = AscW(Selection.Text)
If myValue > 255 Then
If (myValue > 8190 And myValue < 8225) Or (myValue > 288 And myValue < 381) Or (myValue > 701 And myValue < 704) Or myValue = 730 Then
'Ignores Curly Quotes and Transliteration punctuation
ElseIf (myValue > 7935 And myValue < 8192) Or (myValue > 879 And myValue < 1024) Then
'Greek Characters get langgrk applied
Selection.Expand unit:=wdWord
Selection.Style = "langgrk"
ElseIf (myValue > 1423 And myValue < 1535) Then
'Hebrew Characters get langheb applied
Selection.Expand unit:=wdWord
Selection.Style = "langheb"
ElseIf myValue > 7679 And myValue < 7830 Then
'Extended transliteration characters get langtrans applied //OLD VALUES// (myValue > 288 And myValue < 381) Or (myValue > 701 And myValue < 704)
If HCCP = True Then Selection.Expand unit:=wdWord
Selection.Style = "langtrans"
ElseIf (myValue > 19968 And myValue < 40959) Then
'Chinese Characters get langchin applied
Selection.Expand unit:=wdWord
Selection.Style = "langchin"
ElseIf (myValue > 19968 And myValue < 40917) Then
'Japanese Characters get langjap applied
Selection.Expand unit:=wdWord
Selection.Style = "langjap"
Else
If HCCP = True Then Selection.Expand unit:=wdWord
Selection.Style = "lang"
End If
End If
DoNext:
End Sub
Upvotes: 0
Views: 1799
Reputation: 22876
For some reason Range.DetectLanguage
doesn't seem to work on my version of Word (2007), but that might be something to look into instead of checking the character code.
The general approach to speeding up Office VBA macros is to disable the screen updating:
Application.ScreenUpdating = False
' some slow code that causes the screen to be updated
Application.ScreenUpdating = True
That should help a bit in your case because you are using the slower Selection
instead of Range
.
Also, checking the byte values directly seems a bit faster than AscW
:
Sub test()
'Options.DefaultHighlightColorIndex = wdNoHighlight
'Range.HighlightColorIndex = wdNoHighlight ' used for testing to clear Highlight
Dim r As Range, t As Double: t = Timer
Application.ScreenUpdating = False
For Each r In Range.Characters ' For Each r In Range.Words is somehow about 2 times slower than .Characters
checkRange r
Next
Application.ScreenUpdating = True
Debug.Print Timer - t; Range.Words.Count; Range.Characters.Count; Range.End ' " 3.15625 8801 20601 20601 "
End Sub
Sub checkRange(r As Range)
Dim b() As Byte, i As Long, a As Long
b = r.Text ' converts the string to byte array (2 or 4 bytes per character)
'Debug.Print "'" & r & "'"; r.LanguageID; r.LanguageIDFarEast; r.LanguageIDOther
For i = 1 To UBound(b) Step 2 ' 2 bytes per Unicode codepoint
If b(i) > 0 Then ' if AscW > 255
a = b(i): a = a * 256 + b(i - 1) ' AscW
Select Case a
Case &H1F00 To &H1FFF: r.HighlightColorIndex = wdBlue: Exit Sub ' Greek Extended
Case &H3040 To &H30FF: r.HighlightColorIndex = wdPink: Exit Sub ' Hiragana and Katakana
Case &H4E00 To 40959: r.HighlightColorIndex = wdGreen: Exit Sub ' CJK Unified Ideographs
Case 55296 To 56319: ' ignore leading High Surrogates ?
Case 56320 To 57343: ' ignore trailing Low Surrogates ?
Case Else: r.HighlightColorIndex = wdRed: Debug.Print Hex(a), r.End - r.Start ' other
End Select
End If
Next
End Sub
Few of the Unicode code points in your code like 8190
seem a bit off, so you can check them at
http://www.fileformat.info/info/unicode/block/index.htm
Upvotes: 1