Harry Leech
Harry Leech

Reputation: 1

Mac OSX terminal folder names with special characters

I recently switched to MAC OSX and have encountered a problem when creating directories in terminal that have special characters in them.

Basically what happens is this:

This has to be some sort of encoding thing, right?

UPDATE:

forget the $ but use only the é instead (or any other áìç, etc...). The folder is created, it's only that I can't access any files in there in Finder when its created within the terminal

Upvotes: 0

Views: 4961

Answers (2)

Gordon Davisson
Gordon Davisson

Reputation: 125718

In unicode, accented characters like é can generally be represented in two different ways: "precomposed" as a single code point representing the accented letter, or "decomposed" as a series of code points representing the unaccented letter followed by a combining accent (or even more than one...). In the case of "é", its precomposed form would be U+00e9 = UTF-8 0xc3a9 = "Latin small letter e with acute accent", and its decomposed form would be U+0065 U+0301 = UTF-8 0x65cc81 = "Latin small letter e" + "combining acute accent".

When you type the filename into the terminal, you're typing it in precomposed form; but the Mac OS Extended filesystem stores filenames in decomposed form (with some exceptions that aren't relevant here). When you specify a filename with precomposed characters in it, the filesystem will decompose them for storage. Net result: when you try to use the file later, you're trying to access it with a name that's equivalent to -- but not identical to -- the file's actual name. Depending on exactly how you access it, the equivalency may or may not be handled right, so the file may or may not be found.

In general, the filesystem handles equivalencies like this right, but the shell and other programs don't know the details of filesystem encoding and hence get it wrong. So if the shell/other program simply passes the name to the filesystem code, it works, but if the shell/other program tries to figure out if the file exists itself, it'll fail. For example, touch "tést"; [ -e "tést" ] uses the filesystem to find out if "tést" exists, and will find it; but tab-completion of is handled by the shell, and will fail. See this apple.se question.

Upvotes: 4

Jonathan Leffler
Jonathan Leffler

Reputation: 753525

I was unable to reproduce your problem on my Mac, but I haven't gone juggling my locale in the way you have.

Terminal

$ mkdir weird
$ cd weird
$ mkdir naïve résumé touché
$ for d in *; do cp ../q7.c $d/$d.c; done
$ ls -l *
naïve:
total 8
-rw-r-----  1 jleffler  staff  990 Nov 19 07:30 naïve.c

résumé:
total 8
-rw-r-----  1 jleffler  staff  990 Nov 19 07:30 résumé.c

touché:
total 8
-rw-r-----  1 jleffler  staff  990 Nov 19 07:30 touché.c
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
$ 

I happened to have some source in a file q7.c lying around; this created a series of 'accented' directories, each containing an accented file. The command line tools have no problems.

Finder

This is where I can only demonstrate with images, I think:

enter image description here

That should show Finder looking at the file naïve.c in the folder naïve. I was able to click on the file in Finder and it ran XCode:

enter image description here

Suggestion

First, try setting your locale to en_US.UTF-8 and see whether that makes any difference.

If, perchance, it does, then I'd hypothesize that you are creating file names using Latin 1, but Finder runs with UTF-8. The trouble is then that your file names are not valid UTF-8 file names. That could prevent Finder from working.

Here's a program that goes around trying to abuse the system:

#include <sys/stat.h>
#include <stdio.h>

int main(void)
{
    char name[] = "\xC0\xC1\xC2\xC3\xC4\xC5\xC6";
    char file[] = "weird.c";

    /*
    C0 U+00C0 LATIN CAPITAL LETTER A WITH GRAVE
    C1 U+00C1 LATIN CAPITAL LETTER A WITH ACUTE
    C2 U+00C2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
    C3 U+00C3 LATIN CAPITAL LETTER A WITH TILDE
    C4 U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS
    C5 U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
    C6 U+00C6 LATIN CAPITAL LETTER AE
    */

    if (mkdir(name, 0755) != 0)
    {
        fprintf(stderr, "mkdir(%s) failed\n", name);
        return(1);
    }
    char buffer[32];
    snprintf(buffer, sizeof(buffer), "%s/%s.c", name, name);
    FILE *ofp = fopen(name, "w");
    if (ofp == 0)
    {
        fprintf(stderr, "fopen(%s) failed\n", buffer);
        return(1);
    }
    FILE *ifp = fopen(file, "r");
    if (ifp == 0)
    {
        fprintf(stderr, "fopen(%s) failed\n", file);
        return(1);
    }
    size_t nbytes;

    while ((nbytes = fread(buffer, 1, sizeof(buffer), ifp)) != 0)
        fwrite(buffer, 1, nbytes, ofp);
    fclose(ifp);
    fclose(ofp);
    return 0;
}

As you probably know, bytes 0xC0 and 0xC1 can never appear in well-formed UTF-8. The other bytes are legitimate start bytes for 2-byte UTF-8 characters, but the following bytes should always be in the range 0x80..0xAF. Clearly, the names are not well-formed UTF-8.

Osiris JL: make weird
    gcc -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition -Werror weird.c -o weird  
Osiris JL: ls -l
total 40
-rw-r-----  1 jleffler  staff  1629 Nov 19 07:54 makefile
drwxr-----  4 jleffler  staff   136 Nov 19 07:36 naïve
drwxr-----  4 jleffler  staff   136 Nov 19 07:36 résumé
drwxr-----  4 jleffler  staff   136 Nov 19 07:36 touché
-rwxr-----  1 jleffler  staff  9068 Nov 19 08:00 weird
-rw-r-----  1 jleffler  staff  1142 Nov 19 07:59 weird.c
drwxr-----  3 jleffler  staff   102 Nov 19 08:00 weird.dSYM
Osiris JL: ./weird
fopen(???????/???????.c) failed
Osiris JL: ls -l
total 40
drwxr-----  2 jleffler  staff    68 Nov 19 08:00 %C0%C1%C2%C3%C4%C5%C6
-rw-r-----  1 jleffler  staff  1629 Nov 19 07:54 makefile
drwxr-----  4 jleffler  staff   136 Nov 19 07:36 naïve
drwxr-----  4 jleffler  staff   136 Nov 19 07:36 résumé
drwxr-----  4 jleffler  staff   136 Nov 19 07:36 touché
-rwxr-----  1 jleffler  staff  9068 Nov 19 08:00 weird
-rw-r-----  1 jleffler  staff  1142 Nov 19 07:59 weird.c
drwxr-----  3 jleffler  staff   102 Nov 19 08:00 weird.dSYM
Osiris JL:  rmdir *C6
Osiris JL: ./weird 2>&1 | odx
0x0000: 66 6F 70 65 6E 28 C0 C1 C2 C3 C4 C5 C6 2F C0 C1   fopen(......./..
0x0010: C2 C3 C4 C5 C6 2E 63 29 20 66 61 69 6C 65 64 0A   ......c) failed.
0x0020:
Osiris JL: 

So, I was able to create the directory, after a fashion, but I can remove it using rmdir *C6. The name is not what I'd expect. The file could not be created.

Upvotes: 1

Related Questions