Reputation: 4737
I've got a bug with UTF-8 normalizations:
as far as I understood, there's (at least) two ways to write an 'é' in UTF-8 : CC 81
and C3 A9
.
[After a migration from Mac/OSX to a PC/Linux] I now have a conflict between the paths I store in my database and the actual file system structure, which prevents me from accessing correctly my files ...
With the help of java.text.Normalizer, I worked out that in the FS I've got:
NFD true
NFC false
NFKD true
NFKC false
while in the database (and from the keyboard), I have:
NFD false
NFC true
NFKD false
NFKC true
Which of these four normalized-forms shall I comply with? How could I (automatically) fix the encoding of the filesystem directories?
EDIT2: the problem is not at all what I though about at the beginning, hence everything below stroked out.
do you know if there is any rule (RFC ?) defining the handling of file://
URLs?
My concern is about the accents, I try to access a picture at
file:///other/Web/data/images/2005/2005-12-31 Fin d'année/IMGP0012.JPG
but it doesnt' work, EDIT: of course it doesn't work with é
in URL ...
however, Gumbo's suggestion
file:///other/Web/data/images/2005/2005-12-31%20Fin%20d'ann%C3%A9e/IMGP0012.JPG
doesn't work either, but (Firefox->Copy Link Location)
file:///other/Web/data/images/2005/2005-12-31%20Fin%20d%27anne%CC%81e
is okay.
is there any standard way to access this data on the local filesystem, or shall I try all the available encoding ... ?
(my code is written in Java and I test it with FF 3.6)
Upvotes: 1
Views: 1071
Reputation: 4737
I finally 'normalized' (renamed) my file system directories, according to the names stored in the database, OSX messed everything up !
Upvotes: 1
Reputation: 655499
You need to encode these characters with the percent-encoding. Try this:
file:///other/Web/data/images/2005/2005-12-31%20Fin%20d'ann%C3%A9e/IMGP0012.JPG
Here %C3%A9
represents the é
in UTF-8 encoded. Maybe you need to change the character encoding if your application expects a different character encoding than UTF-8.
Upvotes: 4