Reputation: 11
I have a macro that inserts a few literal strings into an excel file to be converted to a txt file. These literal strings have some German special characters. The macro works as expected for colleagues, except for one particular Greek colleague (other Greek colleagues have no issue).
When that specific colleague reviews the macro's code in the visual basic editor, the German special characters are changed to Greek characters. If it were just visual, I wouldn't care, but this also changes the content of the txt file that is generated.
Example:
correct string = "Empfänger"
incorrect string = "Empfδnger"
I tried:
It is almost certainly related to regional settings, but I have been unsuccessful in finding what specific options it is.
Found some posts that directed me to Language for non-unicode programs settings, but that seemingly wasn't the issue.
The Region > Administrative > Language
for non-unicode programs is showing Greek for the user, but it also shows as Greek for the user in which there is no issue.
If anyone can help me track down the issue, I would greatly appreciate it.
Upvotes: 1
Views: 2400
Reputation: 3998
As you are already aware, the Visual Basic Editor is a non-Unicode app and uses a system-locale-dependent 8-bit character encoding (ANSI) for the source code. The string literals in the source code are then transcoded from this locale-specific non-Unicode codepage to UTF-16 at some point before or during compilation.
As I can not reproduce the behavior you are describing
...it also shows as Greek for the user in which there is no issue.
I can only offer a workaround in this answer, because as far as I know (and as per my testing) the thing you describe as issue is the expected behavior:
correctString = "Empfänger" 'Expected for users with e.g. German system-locale
incorrectString = "Empfδnger" 'Expected for users with Greek system-locale
If you use string literals in code (for which there are definitely many legitimate use cases) only characters from the ASCII codepoint range will behave consistently across all system locales.
You can therefore work around this problem by using some functions from my library VBA-StringTools.
The functions you need are the following:
'Replaces all occurences of unicode characters outside the codePoint range
'defined by maxNonEncodedCharCode with literals of the following formattings:
' \uXXXX for characters inside the basic multilingual plane
' \uXXXXXXXX for characters outside the basic multilingual plane
'Where:
' Xes are the digits of the codepoint in hexadecimal. (X = 0-9 or A-F)
Public Function EscapeUnicode(ByRef str As String, _
Optional ByVal maxNonEncodedCharCode As Long = &HFF) _
As String
Dim codepoint As Long
Dim i As Long
Dim j As Long: j = 1
Dim result() As String: ReDim result(1 To Len(str))
For i = 1 To Len(str)
codepoint = AscW(Mid$(str, i, 1)) And &HFFFF&
If codepoint >= &HD800& Then codepoint = AscU(Mid$(str, i, 2))
If codepoint > &HFFFF& Then 'Outside BMP
result(j) = "\u" & "00" & Right$("0" & Hex(codepoint), 6)
i = i + 1
ElseIf codepoint > maxNonEncodedCharCode Then 'BMP
result(j) = "\u" & Right$("00" & Hex(codepoint), 4)
Else
result(j) = Mid$(str, i, 1)
End If
j = j + 1
Next i
EscapeUnicode = Join(result, "")
End Function
#If Mac = 0 Then
'Replaces all occurences of unicode literals
'Accepts the following formattings `escapeFormat`:
' efPython = 1 … \uXXXX \u00XXXXXX (4 or 8 hex digits, 8 for chars outside BMP)
' efRust = 2 … \u{XXXX} \U{XXXXXX} (1 to 6 hex digits)
' efUPlus = 4 … u+XXXX u+XXXXXX (4 or 6 hex digits)
' efMarkup = 8 … &#ddddddd; (1 to 7 decimal digits)
'Where:
' - prefixes \u is case insensitive
' - Xes are the digits of the codepoint in hexadecimal. (X = 0-9 or A-F/a-f)
'Example:
' - "abcd au+0062\U0063xy\u{64}", efAll returns "abcd abcxyd"
'Notes:
' - Avoid u+XXXX syntax if string contains literals without delimiters as it
' can be misinterpreted if adjacent to text starting with 0-9 or a-f.
' - This function can be slow for very long input strings with many
' different literals
Private Function UnescapeUnicode(ByRef str As String, _
Optional ByVal allowSingleSurrogates As Boolean = False) _
As String
Const PATTERN_UNICODE_LITERALS As String = _
"\\u00[01][0-9a-f]{5}|\\u[0-9a-f]{4}|" & _
"\\u{[0-9a-f]{1,6}}|u\+[0-9a-f]{4,6}|&#\d{1,7};"
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = PATTERN_UNICODE_LITERALS
Dim mc As Object: Set mc = .Execute(str)
End With
Dim match As Variant
Dim codepoint As Long
Dim dupeCheck As Collection
Set dupeCheck = New Collection
Dim isDuplicate As Boolean
For Each match In mc
Dim mv As String: mv = match.value
On Error Resume Next
dupeCheck.Add 1, mv
isDuplicate = Err.number <> 0
On Error GoTo 0
If Not isDuplicate Then
If Left$(mv, 1) = "&" Then
codepoint = CLng(Mid$(mv, 3, Len(mv) - 3))
Else
If Mid$(mv, 3, 1) = "{" Then
codepoint = CLng("&H" & Mid$(mv, 4, Len(mv) - 4))
Else
codepoint = CLng("&H" & Mid$(mv, 3))
End If
End If
If codepoint < &H110000 Then
If codepoint < &HD800& Or codepoint >= &HE000& _
Or allowSingleSurrogates Then _
str = Replace(str, mv, ChrU(codepoint))
End If
End If
Next match
UnescapeUnicode = str
End Function
#End If
'Returns the given unicode codepoint as standard VBA UTF-16LE string
Public Function ChrU(ByVal codepoint As Long, _
Optional ByVal allowSingleSurrogates As Boolean = False) As String
Const methodName As String = "ChrU"
If codepoint < &H8000 Then Err.Raise 5, methodName, "Codepoint < -32768"
If codepoint < 0 Then codepoint = codepoint And &HFFFF& 'Incase of uInt input
If codepoint < &HD800& Then
ChrU = ChrW$(codepoint)
ElseIf codepoint < &HE000& And Not allowSingleSurrogates Then
Err.Raise 5, methodName, "Range reserved for surrogate pairs"
ElseIf codepoint < &H10000 Then
ChrU = ChrW$(codepoint)
ElseIf codepoint < &H110000 Then
codepoint = codepoint - &H10000
ChrU = ChrW$(&HD800& Or (codepoint \ &H400&)) & _
ChrW$(&HDC00& Or (codepoint And &H3FF&))
Else
Err.Raise 5, methodName, "Codepoint outside of valid Unicode range."
End If
End Function
'Returns a given characters unicode codepoint as long.
'Note: One unicode character can consist of two VBA "characters", a so-called
' "surrogate pair" (input string of length 2, so Len(char) = 2!)
Public Function AscU(ByRef char As String) As Long
AscU = AscW(char) And &HFFFF&
If Len(char) > 1 Then
Dim lo As Long: lo = AscW(Mid$(char, 2, 1)) And &HFFFF&
If &HDC00& > lo Or lo > &HDFFF& Then Exit Function
AscU = (AscU - &HD800&) * &H400& + (lo - &HDC00&) + &H10000
End If
End Function
When developing the macro, you use the EscapeUnicode
function to replace all Unicode characters outside the ASCII range with escape sequences. For the string in your question, you would put the following line into the immediate window:
?EscapeUnicode("Empfänger", 127) '127 is the last character in the ASCII range
which will return the escaped string Empf\u00E4nger
.
In your actual code, you now use this string together with the UnescapeUnicode
function:
correctString = UnescapeUnicode("Empf\u00E4nger")
The variable will now contain the correct string when inserted into a cell on the worksheet, regardless of system-locale.
Maybe this workaround is more suitable for you than creating a hidden sheet to read the string literals from.
Note that the function UnescapeUnicode
provided in this answer only works on Windows because it uses regular expressions. The function from the GitHub repository supports MacOS and performs much better, so I'd recommend just adding the LibStringTools module to your project.
Upvotes: 2