Marc
Marc

Reputation: 3

Java unzip strange character (encoding?)

when I unzip a zip file in java, I see a strange behaviour with stressed character in the files name.

Syso :

Add File user : L'equipe Technique -- Folder : spec eval continue -- File Name : Capture d’écran 2013-05-29 à 17.24.03.png

If I print the String we didn't see any issue but when I display the char from the String, I've got this :

C a p t u r e d ’ e ́ c r a n

Instead of :

C a p t u r e d ’ é c r a n

It causes trouble when writing the string in database. I do not generate the archive but I have no problem opening it with my OS tools. It may be an encoding issue but I don't see how to solve it...

BufferedInputStream bis = new BufferedInputStream(is);
        ArchiveInputStream ais = new ArchiveStreamFactory().createArchiveInputStream(bis);

        ArchiveEntry entry = null;
        // Parcours des entrées de l'archive
        while((entry = ais.getNextEntry()) != null) {
            System.out.println("Test one");
            // on va essayer de ne pas traiter les dossier
            if (!entry.isDirectory()) {
                String[] filePath = entry.getName().split("/");
                List<String> filePathList = new ArrayList<String>();
                for (int i=0; i<filePath.length; i++) {
                    filePathList.add(filePath[i]);
                }

                // on recupere le dossier qui doit contenir le fichier
                Folder targetFolder = getTargetFolder(filePathList.subList(0, filePathList.size()-1), rootFolder, user, scopeGroupId);

                String targetFileName = "";
                targetFileName = filePathList.get(filePathList.size()-1);

                //Ajout du fichier
                final int BUFFER = 2048;

                FileCacheOutputStream myFile = new FileCacheOutputStream();
                int count;
                byte data[] = new byte[BUFFER];
                while ((count = ais.read(data, 0, BUFFER)) != -1) {
                    myFile.write(data, 0, count);
                }
                System.out.println("Add File user : "+user.getFullName()+" -- Folder : "+targetFolder.getName()+" -- File Name : "+targetFileName);
                addFile(user, targetFolder, targetFileName, myFile.getBytes());
            }
        }

Upvotes: 0

Views: 976

Answers (1)

RealSkeptic
RealSkeptic

Reputation: 34628

Accented characters can be expressed in more than one way in Unicode. You can have a pre-combined é, or a plain e followed by a combining accent.

In your case, the file name is built using the second method. If your database collation doesn't take this into account, or the database is not stored in Unicode, it may become a problem.

You can use the Normalizer class to convert between the two forms. For example:

String normStr = Normalizer.normalize (origStr,Normalizer.Form.NFC);

Upvotes: 1

Related Questions