bwest
bwest

Reputation: 9814

Interacting with files that have unicode characters in filename / escape sequence issues

I am trying to grab a handle to a file that has unicode characters in the filename.

For example, I have a file called c:\testø.txt. If I try new FileInfo("c:\testø.txt") I get an Illegal characters exception.

Trying again with an escape sequence: new FileInfo("c:\test\u00f8.txt") and it works! Yay!

So I've got a method to escape non-ASCII characters:

static string EscapeNonAsciiCharacters(string value)
{
    StringBuilder sb = new StringBuilder();
    foreach (char c in value)
    {
        if (c > 127)
        {
            // This character is too big for ASCII
            string encodedValue = "\\u" + ((int)c).ToString("x4");
            sb.Append(encodedValue);
        }
        else
        {
            sb.Append(c);
        }
    }
    return sb.ToString();
}

But when I take the output from this method the escape characters seem to be incorrect.

EscapeNonAsciiCharacters("c:\testø.txt")  ## => "c:\test\\u00f8.txt"

When I pass that output to the FileInfo constructor, I get the illegal chars exception again. However, the \ in c:\ seems to be unaltered. When I look at how this character is represented within the StringBuilder in the static method, I see: {c: est\u00f8.txt} which leads me to believe that the first backslash is being escaped differently.

How can I properly append the characters escaped by the loop in EscapeNonAsciiCharacters so I don't get the double escape character in my output?

Upvotes: 1

Views: 2006

Answers (2)

LVBen
LVBen

Reputation: 2061

You seem to be misunderstanding escaped characters.

In this C# code, it is the compiler that converts the \u00f8 to the correct unicode character:

new FileInfo("c:\test\u00f8.txt") // (the "\t" is actually causing an error here)

What you are doing here is just setting encodedValue to the string "\u00f8", and there is nothing ever converting the escaped string to the converted string:

string encodedValue = "\\u" + ((int)c).ToString("x4");

If you want to convert the escaped string, then you need to do something like this:

How to convert a string containing escape characters to a string

Upvotes: 0

dodexahedron
dodexahedron

Reputation: 4657

You have more escaped in those strings than you probably intend. Note that \ needs to be escaped when in a string, because it is itself the escape character and \t means tab.

Windows, using NTFS, is fully unicode-capable, so the original error is most likely due to you not escaping the \ character.

I wrote a toy application to deal with the file named ʚ.txt, and the constructor has no problem with that or any other unicode characters.

So, instead of writing new FileInfo("c:\testø.txt"), You need to write new FileInfo("c:\\testø.txt") or new FileInfo(@"c:\testø.txt").

Your escape function is entirely unnecessary in the context of C# in general and NTFS (or, really, most modern file systems). External libraries may, themselves, have incompatibilities with unicode, but that will need to be dealt with separately.

Upvotes: 3

Related Questions