animesh
animesh

Reputation: 237

why do char takes 2 bytes as it can be stored in one byte

can anybody tell me that in c# why does char takes two bytes although it can be stored in one byte. Don't you think it is wastage of a memory. if not , then how is extra 1-byte used? in simple words ..please make me clear what is the use of extra 8-bits.!!

Upvotes: 19

Views: 13109

Answers (5)

Thomas Levesque
Thomas Levesque

Reputation: 292735

although it can be stored in one byte

What makes you think that?

It only takes one byte to represent every character in the English language, but other languages use other characters. Consider the number of different alphabets (Latin, Chinese, Arabic, Cyrillic...), and the number of symbols in each of these alphabets (not only letters or digits, but also punctuation marks and other special symbols)... there are tens of thousands of different symbols in use in the world ! So one byte is never going to be enough to represent them all, that's why the Unicode standard was created.

Unicode has several representations (UTF-8, UTF-16, UTF-32...). .NET strings use UTF-16, which takes two bytes per character (code points, actually). Of course, two bytes is still not enough to represent all the different symbols in the world; surrogate pairs are used to represent characters above U+FFFF

Upvotes: 28

marinara
marinara

Reputation: 538

because utf-8 was probably still too young for microsoft to consider using it

Upvotes: -2

Michael Ames
Michael Ames

Reputation: 2617

In C#, char's are 16-bit Unicode characters by default. Unicode supports a much larger character set than can be supported by ASCII.

If memory really is a concern, here is a good discussion on SO regarding how you might work with 8-bit chars: Is there a string type with 8 BIT chars?

References:

On C#'s char datatype: http://msdn.microsoft.com/en-us/library/x9h8tsay(v=vs.80).aspx

On Unicode: http://en.wikipedia.org/wiki/Unicode

Upvotes: 0

Andrey Agibalov
Andrey Agibalov

Reputation: 7694

The char keyword is used to declare a Unicode character in the range indicated in the following table. Unicode characters are 16-bit characters used to represent most of the known written languages throughout the world.

http://msdn.microsoft.com/en-us/library/x9h8tsay%28v=vs.80%29.aspx

Upvotes: 5

Joseph Marikle
Joseph Marikle

Reputation: 78590

Unicode characters. True, we have enough room in 8bits for the English alphabet, but when it comes to Chinese and such, it takes a lot more characters.

Upvotes: 0

Related Questions