Reputation: 1
I recently switched to MAC OSX and have encountered a problem when creating directories in terminal that have special characters in them.
Basically what happens is this:
té$t
in Finder
and copy a file in there I can access it both in terminal and FinderWhen I create the same directory té$t in terminal and put a file in there I can access it in terminal, in Finder I get an error that the file can't be found. When I rename in terminal the directory without the special characters I can access the file in Finder.
:> locale
LANG=
LC_COLLATE="C"
LC_CTYPE="UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
This has to be some sort of encoding thing, right?
UPDATE:
forget the $ but use only the é instead (or any other áìç, etc...). The folder is created, it's only that I can't access any files in there in Finder when its created within the terminal
Upvotes: 0
Views: 4961
Reputation: 125718
In unicode, accented characters like é can generally be represented in two different ways: "precomposed" as a single code point representing the accented letter, or "decomposed" as a series of code points representing the unaccented letter followed by a combining accent (or even more than one...). In the case of "é", its precomposed form would be U+00e9 = UTF-8 0xc3a9 = "Latin small letter e with acute accent", and its decomposed form would be U+0065 U+0301 = UTF-8 0x65cc81 = "Latin small letter e" + "combining acute accent".
When you type the filename into the terminal, you're typing it in precomposed form; but the Mac OS Extended filesystem stores filenames in decomposed form (with some exceptions that aren't relevant here). When you specify a filename with precomposed characters in it, the filesystem will decompose them for storage. Net result: when you try to use the file later, you're trying to access it with a name that's equivalent to -- but not identical to -- the file's actual name. Depending on exactly how you access it, the equivalency may or may not be handled right, so the file may or may not be found.
In general, the filesystem handles equivalencies like this right, but the shell and other programs don't know the details of filesystem encoding and hence get it wrong. So if the shell/other program simply passes the name to the filesystem code, it works, but if the shell/other program tries to figure out if the file exists itself, it'll fail. For example, touch "tést"; [ -e "tést" ]
uses the filesystem to find out if "tést" exists, and will find it; but tab-completion of té
is handled by the shell, and will fail. See this apple.se question.
Upvotes: 4
Reputation: 753525
I was unable to reproduce your problem on my Mac, but I haven't gone juggling my locale in the way you have.
$ mkdir weird
$ cd weird
$ mkdir naïve résumé touché
$ for d in *; do cp ../q7.c $d/$d.c; done
$ ls -l *
naïve:
total 8
-rw-r----- 1 jleffler staff 990 Nov 19 07:30 naïve.c
résumé:
total 8
-rw-r----- 1 jleffler staff 990 Nov 19 07:30 résumé.c
touché:
total 8
-rw-r----- 1 jleffler staff 990 Nov 19 07:30 touché.c
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
$
I happened to have some source in a file q7.c
lying around; this created a series of 'accented' directories, each containing an accented file. The command line tools have no problems.
This is where I can only demonstrate with images, I think:
That should show Finder looking at the file naïve.c
in the folder naïve
. I was able to click on the file in Finder and it ran XCode:
First, try setting your locale to en_US.UTF-8
and see whether that makes any difference.
If, perchance, it does, then I'd hypothesize that you are creating file names using Latin 1, but Finder runs with UTF-8. The trouble is then that your file names are not valid UTF-8 file names. That could prevent Finder from working.
Here's a program that goes around trying to abuse the system:
#include <sys/stat.h>
#include <stdio.h>
int main(void)
{
char name[] = "\xC0\xC1\xC2\xC3\xC4\xC5\xC6";
char file[] = "weird.c";
/*
C0 U+00C0 LATIN CAPITAL LETTER A WITH GRAVE
C1 U+00C1 LATIN CAPITAL LETTER A WITH ACUTE
C2 U+00C2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
C3 U+00C3 LATIN CAPITAL LETTER A WITH TILDE
C4 U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS
C5 U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
C6 U+00C6 LATIN CAPITAL LETTER AE
*/
if (mkdir(name, 0755) != 0)
{
fprintf(stderr, "mkdir(%s) failed\n", name);
return(1);
}
char buffer[32];
snprintf(buffer, sizeof(buffer), "%s/%s.c", name, name);
FILE *ofp = fopen(name, "w");
if (ofp == 0)
{
fprintf(stderr, "fopen(%s) failed\n", buffer);
return(1);
}
FILE *ifp = fopen(file, "r");
if (ifp == 0)
{
fprintf(stderr, "fopen(%s) failed\n", file);
return(1);
}
size_t nbytes;
while ((nbytes = fread(buffer, 1, sizeof(buffer), ifp)) != 0)
fwrite(buffer, 1, nbytes, ofp);
fclose(ifp);
fclose(ofp);
return 0;
}
As you probably know, bytes 0xC0 and 0xC1 can never appear in well-formed UTF-8. The other bytes are legitimate start bytes for 2-byte UTF-8 characters, but the following bytes should always be in the range 0x80..0xAF. Clearly, the names are not well-formed UTF-8.
Osiris JL: make weird
gcc -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition -Werror weird.c -o weird
Osiris JL: ls -l
total 40
-rw-r----- 1 jleffler staff 1629 Nov 19 07:54 makefile
drwxr----- 4 jleffler staff 136 Nov 19 07:36 naïve
drwxr----- 4 jleffler staff 136 Nov 19 07:36 résumé
drwxr----- 4 jleffler staff 136 Nov 19 07:36 touché
-rwxr----- 1 jleffler staff 9068 Nov 19 08:00 weird
-rw-r----- 1 jleffler staff 1142 Nov 19 07:59 weird.c
drwxr----- 3 jleffler staff 102 Nov 19 08:00 weird.dSYM
Osiris JL: ./weird
fopen(???????/???????.c) failed
Osiris JL: ls -l
total 40
drwxr----- 2 jleffler staff 68 Nov 19 08:00 %C0%C1%C2%C3%C4%C5%C6
-rw-r----- 1 jleffler staff 1629 Nov 19 07:54 makefile
drwxr----- 4 jleffler staff 136 Nov 19 07:36 naïve
drwxr----- 4 jleffler staff 136 Nov 19 07:36 résumé
drwxr----- 4 jleffler staff 136 Nov 19 07:36 touché
-rwxr----- 1 jleffler staff 9068 Nov 19 08:00 weird
-rw-r----- 1 jleffler staff 1142 Nov 19 07:59 weird.c
drwxr----- 3 jleffler staff 102 Nov 19 08:00 weird.dSYM
Osiris JL: rmdir *C6
Osiris JL: ./weird 2>&1 | odx
0x0000: 66 6F 70 65 6E 28 C0 C1 C2 C3 C4 C5 C6 2F C0 C1 fopen(......./..
0x0010: C2 C3 C4 C5 C6 2E 63 29 20 66 61 69 6C 65 64 0A ......c) failed.
0x0020:
Osiris JL:
So, I was able to create the directory, after a fashion, but I can remove it using rmdir *C6
. The name is not what I'd expect. The file could not be created.
Upvotes: 1