Reputation: 175
Trying to debug an issue with saving multicharacter data to the database, I narrowed down the issue to how Groovy is handling my strings. I have this snippet of code:
println "Hi!"
def strings=[
"Dies ist eine Testlinie in deutscher Sprache.",
"C'est une ligne d'essai en français.",
"Is é seo an líne tástála i nGaeilge..",
"Esta é unha liña de proba en galego.",
"Questa è una linea di prova in italiano.",
"Dette er en test linje i norsk.",
"Þetta er próf lína í íslensku.",
"Góðan dag, hvernig ert þú að gera?",
"Ich komme aus Köln"
]
strings.each {
println it
}
I saved this snippet copying the lines from my browser into the GroovyConsole.
If I run it on the Windows command line with:
groovy testAnsi.groovy
I get:
Hi!
Dies ist eine Testlinie in deutscher Sprache.
C'est une ligne d'essai en franτais.
Is Θ seo an lφne tßstßla i nGaeilge..
Esta Θ unha li±a de proba en galego.
Questa Φ una linea di prova in italiano.
Dette er en test linje i norsk.
▐etta er pr≤f lφna φ φslensku.
G≤≡an dag, hvernig ert ■· a≡ gera?
Ich komme aus K÷ln
If I open the file with Notepad++, it says it's written in ANSI and shows lots of odd characters. If I open it with Notepad, it says it's written in ANSI and shows the proper characters.
If I then save the file with Notepad in Unicode (it shows as UC2 Little Endian with all the proper characters in Notepad++), and run it on the command line as:
groovy -c UTF-16 testUnicode.groovy
I get this:
Hi!
Dies ist eine Testlinie in deutscher Sprache.
C'est une ligne d'essai en franτais.
Is Θ seo an lφne tßstßla i nGaeilge..
Esta Θ unha li±a de proba en galego.
Questa Φ una linea di prova in italiano.
Dette er en test linje i norsk.
▐etta er pr≤f lφna φ φslensku.
G≤≡an dag, hvernig ert ■· a≡ gera?
Ich komme aus K÷ln
But when I run either the ANSI or Unicode file within the GroovyConsole, I get the expected results in the output panel.
Now, if I do:
more testAnsi.groovy
I get the same gibberish I get if I would run the script.
If I do:
more testUnicode.groovy
I get proper characters, except for the Icelandic ones.
Note, I get the same exact results if I run the code on a Linux box, but when using cat to display the content of the source files, I always get the proper characters.
I am stumped. Clearly I'm doing something wrong, but I don't know what.
How do I get Groovy to output the characters that are in my strings, just as I have them in the file?
Upvotes: 2
Views: 2088
Reputation: 536409
Groovy is outputting in the Java default character set, which for your platform is code page 1252 (Western European “ANSI”). However the results are being displayed in code page 437 the old DOS “OEM” code page.
You may be able to fix this by typing:
chcp 1252
in the command prompt before running Groovy. Naturally the output of the program will have to fit within that code page, so you can use the Western European accents in your example but you won't be able to write any other alphabets.
For testing Unicode behaviour you would be well advised to avoid the Windows command prompt like the plague.
Upvotes: 4