xdevel2000
xdevel2000

Reputation: 21364

Multibyte strings and ordinary string

I don't understand if for C every string is always a multibyte string meaning they are encoded as multibyte characters:

char s[] = "AAA"; 

char m[] = "X生";

is s also a multibyte string also if it doesn't contain a member of an extended character set like m?

I have this doubt because I read this from libc manuals:

string” normally refers to multibyte character strings as opposed to wide character strings. Wide character strings are arrays of type wchar_t and as for multibyte character strings usually pointers of type wchar_t * are used.

so I don't understand if multibyte is referred to the byte of the string (their number) of to the encode respect to wide character string.

Upvotes: 2

Views: 5466

Answers (2)

Shafik Yaghmour
Shafik Yaghmour

Reputation: 158449

So the C99 draft standard (C11 looks the same) defines multibyte character as follows:

sequence of one or more bytes representing a member of the extended character set of either the source or the execution environment

So a multibyte character is part of the extended character set, so s is not made up of multi-byte characters.

multibyte characters are further defined in section 5.2.1.2:

The source character set may contain multibyte characters, used to represent members of the extended character set. The execution character set may also contain multibyte characters, which need not have the same encoding as for the source character set. For both character sets, the following shall hold:

  • The basic character set shall be present and each character shall be encoded as a single byte.

  • The presence, meaning, and representation of any additional members is locales pecific.

  • A multibyte character set may have a state-dependent encoding, wherein each sequence of multibyte characters begins in an initial shift state and enters other locale-specific shift states when specific multibyte characters are encountered in the sequence. While in the initial shift state, all single-byte characters retain their usual interpretation and do not alter the shift state. The interpretation for subsequent bytes in the sequence is a function of the current shift state.

  • A byte with all bits zero shall be interpreted as a null character independent of shift state. Such a byte shall not occur as part of any other multibyte character.

Upvotes: 3

Lucas
Lucas

Reputation: 14129

You can easily try to test how many bytes a string has. If I compile it on my machine with the following code:

char s[] = "AAA";
char m[] = "X生";
printf("s: %d\n", sizeof(s));
printf("m: %d\n", sizeof(m));

I'll get as an result the output

s: 4
m: 5

That means "s" isn't a multibyte string but "m" is. To make sure your compiler/system behaves the same way, I would just test it on your system.

Upvotes: 2

Related Questions