Reputation: 3
when I unzip a zip file in java, I see a strange behaviour with stressed character in the files name.
Syso :
Add File user : L'equipe Technique -- Folder : spec eval continue -- File Name : Capture d’écran 2013-05-29 à 17.24.03.png
If I print the String we didn't see any issue but when I display the char from the String, I've got this :
C a p t u r e d ’ e ́ c r a n
Instead of :
C a p t u r e d ’ é c r a n
It causes trouble when writing the string in database. I do not generate the archive but I have no problem opening it with my OS tools. It may be an encoding issue but I don't see how to solve it...
BufferedInputStream bis = new BufferedInputStream(is);
ArchiveInputStream ais = new ArchiveStreamFactory().createArchiveInputStream(bis);
ArchiveEntry entry = null;
// Parcours des entrées de l'archive
while((entry = ais.getNextEntry()) != null) {
System.out.println("Test one");
// on va essayer de ne pas traiter les dossier
if (!entry.isDirectory()) {
String[] filePath = entry.getName().split("/");
List<String> filePathList = new ArrayList<String>();
for (int i=0; i<filePath.length; i++) {
filePathList.add(filePath[i]);
}
// on recupere le dossier qui doit contenir le fichier
Folder targetFolder = getTargetFolder(filePathList.subList(0, filePathList.size()-1), rootFolder, user, scopeGroupId);
String targetFileName = "";
targetFileName = filePathList.get(filePathList.size()-1);
//Ajout du fichier
final int BUFFER = 2048;
FileCacheOutputStream myFile = new FileCacheOutputStream();
int count;
byte data[] = new byte[BUFFER];
while ((count = ais.read(data, 0, BUFFER)) != -1) {
myFile.write(data, 0, count);
}
System.out.println("Add File user : "+user.getFullName()+" -- Folder : "+targetFolder.getName()+" -- File Name : "+targetFileName);
addFile(user, targetFolder, targetFileName, myFile.getBytes());
}
}
Upvotes: 0
Views: 976
Reputation: 34628
Accented characters can be expressed in more than one way in Unicode. You can have a pre-combined é, or a plain e followed by a combining accent.
In your case, the file name is built using the second method. If your database collation doesn't take this into account, or the database is not stored in Unicode, it may become a problem.
You can use the Normalizer class to convert between the two forms. For example:
String normStr = Normalizer.normalize (origStr,Normalizer.Form.NFC);
Upvotes: 1