Bass
Bass

Reputation: 5318

Non-ASCII characters in Java class names on HFS+ filesystem

According to the JLS, it is possible to "mangle" package names containing non-ASCII characters in case host filesystem doesn't support Unicode. For instance, package é becomes @00e9, and papierMâché becomes papierM@00e2ch@00e9 when projected to the file system.

The question is: is it ever possible to achieve just the same for Java source files (whose names must confirm to the corresponding names of Java classes)?

The background of the problem is I need to have an accented e with acute in my public class name ('é', '\u00e9'). Yes I know I shouldn't, and Unicode in file names is a malpractice, but still I need it.

However, either Mac OS X or the underlying HFS+ filesystem disallows this very character in file names, replacing it with 'e' immediately followed by COMBINING ACUTE ACCENT ("e\u0301"). This behaviour is totally different from NTFS or ext3/ext4, where two files named "\u00e9" and "e\u0301" can co-exist in the same directory (test repository is here).

The above HFS+ behaviour results in 2 problems:

  1. I'm unable to compile my classes with javac because class name and file name are not the same (though I am able to compile them with either Maven or ecj).
  2. I can't have my classes managed with Git, as it always reports that the file has been renamed:

.

$ git status .
# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#   "src/main/java/com/intersystems/persistence/Cache\314\201ExtremeConnectionParameters.java"
#   "src/main/java/com/intersystems/persistence/Cache\314\201ExtremePersister.java"
#   "src/main/java/com/intersystems/persistence/Cache\314\201JdbcConnectionParameters.java"
#   "src/main/java/com/intersystems/persistence/Cache\314\201JdbcPersister.java"
#   "src/main/java/com/intersystems/persistence/ui/Cache\314\201JdbcConnectionParametersPanel.java"
nothing added to commit but untracked files present (use "git add" to track)

Upvotes: 3

Views: 878

Answers (1)

Paul Wagland
Paul Wagland

Reputation: 29134

If you want your names to be ASCII safe, then you could just name your java file as papierM@[email protected], and ensure that it gets compiled before any other class tries to reference it. This will work, since the <filename>.java does not need to be <classname>.java, however this is common practice, and compiler will not try to compile ClassA from ADifferentFilename.java, for obvious reasons. However, if ADifferentFilename.java is already compiled to ClassA.class, then it will work.

Other than that, you are out of luck with respect to naming your files in pure ASCII.

As an aside, you mention that you have solved the git problem by using a .gitignore file, however you will probably find that a better way to do it would be to enable the precomposeunicode option in git.

git config --global core.precomposeunicode true

If you use this, then you should be able to have your file papierMâché.java and access it from all of Linux, Mac and Windows.

Upvotes: 1

Related Questions