Reputation: 9814
I am trying to grab a handle to a file that has unicode characters in the filename.
For example, I have a file called c:\testø.txt
. If I try new FileInfo("c:\testø.txt")
I get an Illegal characters exception.
Trying again with an escape sequence: new FileInfo("c:\test\u00f8.txt")
and it works! Yay!
So I've got a method to escape non-ASCII characters:
static string EscapeNonAsciiCharacters(string value)
{
StringBuilder sb = new StringBuilder();
foreach (char c in value)
{
if (c > 127)
{
// This character is too big for ASCII
string encodedValue = "\\u" + ((int)c).ToString("x4");
sb.Append(encodedValue);
}
else
{
sb.Append(c);
}
}
return sb.ToString();
}
But when I take the output from this method the escape characters seem to be incorrect.
EscapeNonAsciiCharacters("c:\testø.txt") ## => "c:\test\\u00f8.txt"
When I pass that output to the FileInfo
constructor, I get the illegal chars exception again. However, the \
in c:\
seems to be unaltered. When I look at how this character is represented within the StringBuilder in the static method, I see: {c: est\u00f8.txt}
which leads me to believe that the first backslash is being escaped differently.
How can I properly append the characters escaped by the loop in EscapeNonAsciiCharacters
so I don't get the double escape character in my output?
Upvotes: 1
Views: 2006
Reputation: 2061
You seem to be misunderstanding escaped characters.
In this C# code, it is the compiler that converts the \u00f8
to the correct unicode character:
new FileInfo("c:\test\u00f8.txt") // (the "\t" is actually causing an error here)
What you are doing here is just setting encodedValue
to the string "\u00f8"
, and there is nothing ever converting the escaped string to the converted string:
string encodedValue = "\\u" + ((int)c).ToString("x4");
If you want to convert the escaped string, then you need to do something like this:
How to convert a string containing escape characters to a string
Upvotes: 0
Reputation: 4657
You have more escaped in those strings than you probably intend.
Note that \
needs to be escaped when in a string, because it is itself the escape character and \t
means tab.
Windows, using NTFS, is fully unicode-capable, so the original error is most likely due to you not escaping the \
character.
I wrote a toy application to deal with the file named ʚ.txt, and the constructor has no problem with that or any other unicode characters.
So, instead of writing new FileInfo("c:\testø.txt")
, You need to write new FileInfo("c:\\testø.txt")
or new FileInfo(@"c:\testø.txt")
.
Your escape function is entirely unnecessary in the context of C# in general and NTFS (or, really, most modern file systems). External libraries may, themselves, have incompatibilities with unicode, but that will need to be dealt with separately.
Upvotes: 3