Reputation: 86
Im facing a problem with umlauts in groovy/java on a ubuntu server.
This groovy code return for exists() false for files with umlauts:
def f1 = new File('/var/lib/jenkins/test/')
def files = [:]
f1.listFiles().each {
files.put(it.name, it.getAbsoluteFile().exists())
}
println files
println 'file.encoding:' + System.getProperty('file.encoding')
Results in:
Verderblichkeit.docx:true
Gefa��hrlichkeit.docx:false
file.encoding:"iso-8859-1"
So it return false for a file it found itself with listFile(). That is wrong.
ls -al in the drirectory:
drwxr-xr-x 2 jenkins jenkins 4096 Jan 5 18:17 .
drwxr-xr-x 66 jenkins jenkins 12288 Jan 5 18:16 ..
-rw-r--r-- 1 jenkins jenkins 98035 Jan 5 18:16 Gefährlichkeit.docx
-rw-r--r-- 1 jenkins jenkins 277515 Jan 5 18:17 Verderblichkeit.docx
In linux I can copy or mv or rename the files and see the umlauts.
Environment:
Note: The original problem is getting the file path from a database. The file can be found and served throug nginx but in the java app (grails with groovy files) I get a false result for File.exists()
What can I do?
I tried setting UTF-8 as file.encoding by setting this in the application environment or by -D param on start. I searched the web but didn't find a solution.
Upvotes: 2
Views: 181
Reputation: 86
The problem occured in different environments:
Short answer: The problem was the wrong settings for sun.jnu.encoding. Solution was to set it in the correct way for each env.
Long answer: We had to set the java system property 'sun.jnu.encoding' in the different envs :
Set system properties in the bootRun section in build.gradle:
bootRun {
jvmArgs(
'-Dsun.jnu.encoding=UTF-8',
'-Dfile.encoding=UTF-8',
...)
}
Set system properties in setenv.sh in tomcat/bin:
export JAVA_OPTS="-Dsun.jnu.encoding=UTF-8 -Dfile.encoding=UTF-8 $JAVA_OPTS"
We used this solution https://stackoverflow.com/a/28406007/14748724. We need to rebuild the container image.
Finally we had to set this in the docker-compose.yaml file:
tomcat:
environment:
LC_ALL: 'en_US.UTF-8'
Before it was LC_ALL: 'C'
, which was wrong.
Note: Using the setenv.sh solution from env 2. didn't work in the container!
Upvotes: 1
Reputation: 4296
This is not an answer as such, but it allows me to show the problem with Unicode composition and file names. Let's create two files with the same name:
goose@t410:/tmp$ touch $(echo -e '\x61\xCC\x88.txt')
goose@t410:/tmp$ touch $(echo -e '\xC3\xA4.txt')
goose@t410:/tmp$ ls *.txt
ä.txt ä.txt
What!? Hang on, this is a trick isn't it? They are really the same file? Here's proof they are different:
goose@t410:/tmp$ ls -i *.txt
131467 ä.txt 131527 ä.txt
Upvotes: 0