Reputation: 33
I am using alfresco download upload services using java.
When I upload the file to alfreco server it gives me the following path :
/app:Home/cm:Company_x0020_Home/cm:Abc/cm:TestFile/cm:V4/cm:BC1X_x0020_0400_x0020_0109-_x0028_1-2_x0029__v2.pdf
When I use the same file path and download using alfresco services I took the file name at the end of the path
i.e ABC1X_x0020_0400_x0020_0109-_x0028_1-2_x0029__v2.pdf
How can I remove or decode the [Unicode] characters in fileName
String decoded = URLDecoder.decode(queryString, "UTF-8");
The above does not work .
These are some Unicode characters which appeared in my file name. https://en.wikipedia.org/wiki/List_of_Unicode_characters
Please do not mark the question as duplicate as I have searched below links but non of those gave the solution. Following are the links that I have searched for replacing unicode charectors in String with java.
Java removing unicode characters
Remove non-ASCII characters from String in Java
How can I replace a unicode character in java string
Java Replace Unicode Characters in a String
Upvotes: 3
Views: 1406
Reputation: 1377
The solution given by Jeff Potts will be perfect . But i had a situation where i was using file name in diffrent project where i wont use org.alfresco related jars
I had to take all those dependencies to use for a simple file decoding So i used java native methods which uses regex to parse the file name and decode it,which gave me the perfect solution which was same from using
ISO9075.decode(test);
This is the code which can be used
public String decode_FileName(String fileName) {
System.out.println("fileName : " + fileName);
String decodedfileName = fileName;
String temp = "";
Matcher m = Pattern.compile("\\_x(.*?)\\_").matcher(decodedfileName); //rejex which matches _x0020_ kind of charectors
List<String> unicodeChars = new ArrayList<String>();
while (m.find()) {
unicodeChars.add(m.group(1));
}
for (int i = 0; i < unicodeChars.size(); i++) {
temp = unicodeChars.get(i);
if (isInteger(temp)) {
String replace_char = String.valueOf(((char) Integer.parseInt(String.valueOf(temp), 16)));//converting
decodedfileName = decodedfileName.replace("_x" + temp + "_", replace_char);
}
}
System.out.println("Decoded FileName :" + decodedfileName);
return decodedfileName;
}
And use this small java util to know Is integer
public static boolean isInteger(String s) {
try {
Integer.parseInt(s);
} catch (NumberFormatException e) {
return false;
} catch (NullPointerException e) {
return false;
}
return true;
}
So the above code works as simple as this :
Example :
0028 Left parenthesis U+0028 You can see in the link https://en.wikipedia.org/wiki/List_of_Unicode_characters
String replace_char = String.valueOf(((char) Integer.parseInt(String.valueOf("0028"), 16)));
System.out.println(replace_char);
This code gives output : (
which is a Left parenthesis
This is what the logic i have used in my java program.
The above program will give results same as ISO9075.decode(test)
Output :
fileName : ABC1X_x0020_0400_x0020_0109-_x0028_1-2_x0029__v2.pdf
Decoded FileName :ABC1X 0400 0109-(1-2)_v2.pdf
Upvotes: 3
Reputation: 10538
In the org.alfresco.util package you will find a class called ISO9075. You can use it to encode and decode strings according to that spec. For example:
String test = "ABC1X_x0020_0400_x0020_0109-_x0028_1-2_x0029__v2.pdf";
String out = ISO9075.decode(test);
System.out.println(out);
Returns:
ABC1X 0400 0109-(1-2)_v2.pdf
If you want to see what it does behind the scenes, look at the source.
Upvotes: 2