Dirk27
Dirk27

Reputation: 86

File.list()*.exists() not true for files with umlauts

Im facing a problem with umlauts in groovy/java on a ubuntu server.

This groovy code return for exists() false for files with umlauts:

def f1 = new File('/var/lib/jenkins/test/')
def files = [:]
f1.listFiles().each {
  files.put(it.name, it.getAbsoluteFile().exists())
}
println files
println 'file.encoding:' + System.getProperty('file.encoding')

Results in:

Verderblichkeit.docx:true
Gefa��hrlichkeit.docx:false
file.encoding:"iso-8859-1"

So it return false for a file it found itself with listFile(). That is wrong.

ls -al in the drirectory:

drwxr-xr-x  2 jenkins jenkins   4096 Jan  5 18:17 .
drwxr-xr-x 66 jenkins jenkins  12288 Jan  5 18:16 ..
-rw-r--r--  1 jenkins jenkins  98035 Jan  5 18:16 Gefährlichkeit.docx
-rw-r--r--  1 jenkins jenkins 277515 Jan  5 18:17 Verderblichkeit.docx

In linux I can copy or mv or rename the files and see the umlauts.

Environment:

Note: The original problem is getting the file path from a database. The file can be found and served throug nginx but in the java app (grails with groovy files) I get a false result for File.exists()

What can I do?

I tried setting UTF-8 as file.encoding by setting this in the application environment or by -D param on start. I searched the web but didn't find a solution.

Upvotes: 2

Views: 181

Answers (2)

Dirk27
Dirk27

Reputation: 86

Solution

The problem occured in different environments:

  1. development env: grails 4 application startet with gradle bootRun
  2. CI-stage with a tomcat 9 server
  3. production env: tomcat running in a docker container

Short answer: The problem was the wrong settings for sun.jnu.encoding. Solution was to set it in the correct way for each env.

Long answer: We had to set the java system property 'sun.jnu.encoding' in the different envs :

1. dev env

Set system properties in the bootRun section in build.gradle:

bootRun {
    jvmArgs(
        '-Dsun.jnu.encoding=UTF-8',
        '-Dfile.encoding=UTF-8',
        ...)
}

2. tomcat 9 on server

Set system properties in setenv.sh in tomcat/bin:

export JAVA_OPTS="-Dsun.jnu.encoding=UTF-8 -Dfile.encoding=UTF-8 $JAVA_OPTS"

3. tomcat 9 in docker container in prod env

We used this solution https://stackoverflow.com/a/28406007/14748724. We need to rebuild the container image.

Finally we had to set this in the docker-compose.yaml file:

tomcat:
   environment:
      LC_ALL: 'en_US.UTF-8'

Before it was LC_ALL: 'C', which was wrong.

Note: Using the setenv.sh solution from env 2. didn't work in the container!

Upvotes: 1

g00se
g00se

Reputation: 4296

This is not an answer as such, but it allows me to show the problem with Unicode composition and file names. Let's create two files with the same name:

goose@t410:/tmp$ touch $(echo -e '\x61\xCC\x88.txt')
goose@t410:/tmp$ touch $(echo -e '\xC3\xA4.txt')
goose@t410:/tmp$ ls *.txt
ä.txt  ä.txt

What!? Hang on, this is a trick isn't it? They are really the same file? Here's proof they are different:

goose@t410:/tmp$ ls -i *.txt

131467 ä.txt 131527 ä.txt

Upvotes: 0

Related Questions