ScruffyDuck
ScruffyDuck

Reputation: 2666

How does WChar relate to Unicode and ASCII

I am about to show my total ignorance of how encoding works and different string formats.

I am passing a string to a compiler (Microsoft as it happens amd for their Flight Simulator). The string is passed as part of an XML document which is used as the source for the compiler. This is created using using standard NET strings. I have not needed to specifically specify any encoding or setting of type since the XML is just text.

The string is just a collection of characters. This is an example of one that gives the error:

ARG, AFL, AMX, ACA, DAH, CCA, AEL, AGN, MAU, SEY, TSC, AZA, AAL, ANA, BBC, CPA, CAL, COA, CUB, DAL, UGX, ELY, UAE, ERT, ETH, EEZ, GHA, IRA, JAL, NWA, KAL, KAC, LAN, LDI, MAS, MEA, PIA, QTR, RAM, RJA, SVA, SIA, SWR, ROT, THA, THY, AUI, UAL, USA, ACA, TAR, UZB, IYE, QFA

If I create the string using my C# managed program then there is no issue. However this string is coming from a c++ program that can create the compiled file using its own compiler that is not compliant with the MS one

The MS compiler does not like the string. It throws two errors:

INTERNAL COMPILER ERROR: #C2621: Couldn't convert WChar string! INTERNAL COMPILER ERROR: #C2029: Failed to convert attribute value from UNICODE!

Unfortunately there is not any useful documentation with the compiler on its errors. We just makethe best of what we see!

I have seen other errors of this type but these contain hidden characters and control characters that I can trap and remove.

In this case I looked at the string as a Char[] and could not see anything unusual. Only what I expected. No values above the ascii limit of 127 and no control characters.

I understand that WChar is something that C++ understands (but I don't), Unicode is a two byte representation of characters and ASCII is a one byte representation.

I would like to do two things - first identify a string that will fail if passed to the compiler and second fix the string. I assume the compiler is expecting ASCII.

EDIT

I told an untruth - in fact I do use encoding. I checked the code I used to convert a byte array into a string.

public static string Bytes2String(byte[] bytes, int start, int length) {
            string temp = Encoding.Defaut.GetString(bytes, start, length);

        }

I realized that Default might be an issue but changing it to ASCII makes no difference. I am beginning to believe that the error message is not what it seems.

Upvotes: 3

Views: 3521

Answers (2)

ScruffyDuck
ScruffyDuck

Reputation: 2666

I have to come clean that the compiler error has nothing to do with the encoding format of the string. It turns out that it is the length of the string that is at fault. As per the sample there are a number of entries separated by commas. The compiler throws the rather unhelful messages if the entry count exceeds 50.

However Thanks everyone for your help - it has raised the issue of encoding in my mind and I will now look at it much more carefully

Upvotes: 0

Ergwun
Ergwun

Reputation: 12978

It looks like you are taking a byte array, and converting it as a string using the encoding returned by Encoding.Default.

It is recommended that you do not do this (in the Microsoft documentation).

You need to work out what encoding is being used in the C++ program to generate the byte array, and use the same one (or a compatible one) to convert the byte array back to a string again in the C# code. E.g. if the byte array is using ASCII encoding, you could use:

System.Text.ASCIIEncoding.GetString(bytes, start, length);

or

System.Text.UTF8Encoding.GetString(bytes, start, length);

P.S. I hope Joel doesn't catch you ;)

Upvotes: 2

Related Questions