Reputation: 5362
I am trying to create a zip with folders inside it and I have to sanitize the folder names against any illegal characters. I did some googling around and found this method from http://www.rgagnon.com/javadetails/java-0662.html:
public static String sanitizeFilename(String name) {
return name.replaceAll("[\\\\/:*?\"<>|]", "-");
}
However, upon testing I get some weird results. For example:
name = filename£/?e>"e
should return filename£--e--e
from my understanding. But instead it returns filename-ú--e--e
Why is this so?
Please note that I am testing this by opening the downloaded zip file in WinZip and looking at the folder name that is created. I can't get the pound sign to appear. I've also tried this:
public static String sanitizeFilename(String name) {
name = name.replaceAll("[£]", "\u00A3");
return name.replaceAll("[\\\\/:*?\"<>|]", "-");
}
EDIT: Some more research and I found this: http://illegalargumentexception.blogspot.co.uk/2009/04/i18n-unicode-at-windows-command-prompt.html It appears to do with Locale, windows versions and encoding factors. Not sure how I can overcome this within the code.
Upvotes: 0
Views: 2682
Reputation: 48404
I think it depends on how you are actually reading the file name in terms of encoding.
Therefore, the £
symbol might get corrupted.
As an example not fitting your case exactly, reading UTF-8-encoded £
as an ISO Latin 1-encoded character would return £
.
Make sure of the file's encoding (i.e. ISO Latin 1 vs UTF-8 would be the most common), then use the appropriate parameter for your Reader
.
As a snippet, you may want to consider this example:
BufferedReader br = new BufferedReader(
new InputStreamReader(
new FileInputStream(new File("yourTextFile")),
"[your file's encoding]"
)
);
Upvotes: 3