Josh Withee
Josh Withee

Reputation: 11386

Sql Server - does VARCHAR length indicate max number of characters or number of bytes?

For a SQL Server column of type

VARCHAR(100)

Before I insert into the column, I want to make sure that the value being inserted is not greater than 100. Is that number specifying the max number of characters it can store, or is it the number of bytes of character data it can store?

The reason I'm asking is that some unicode special characters use more than one byte. Therefore, a 100 character string encoded in unicode could take up more than 100 bytes.

Since varchar is for ASCII encoding, is it possible for any ASCII character to take up more than one byte (which might require checking the byte length)?

(EDIT: Based on feedback I got on the question, I see that varchar should be used for ASCII and nvarchar for unicode.)

Upvotes: 2

Views: 3499

Answers (3)

Josh Hull
Josh Hull

Reputation: 1783

I learned something from researching this question!

In SQL Server

  1. nvarchar takes double the storage because it uses a two byte character set UNICODE UCS-2.

n defines the string length ... The storage size, in bytes, is two times the actual length of data entered + 2 bytes.

This tells me that the length specified for nvarchar is most certainly the number of characters, not the bytes.

  1. varchar Is one byte for one character storage, and stores single byte non-unicode character data.

n defines the string length ... The storage size is the actual length of the data entered + 2 bytes.

I would infer from those two statements that the number indicated for the length of the varchar or nvarchar column is indeed the number of characters.

The phrase length of the data entered is somewhat ambiguous, but from the two descriptions I think it's reasonable to conclude that they mean the number of characters entered.

If you have the potential for receiving and storing two byte character data, always choose nvarchar over varchar even though the performance may take a hit. The linked question and answers are helpful to see why.

The bottom line is that SQL Server is expressing the length of the varchar and nvarchar column as the number of characters entered. It will take care of the storage for you. Don't worry about bytes!

NOTE: Adding to the confusion is that Oracle allows you specify either byte length or character length in the native type VARCHAR2:

Oracle VARCHAR2

With the increasing use of multi-byte character sets to support globalized databases comes the problem of bytes no longer equating to characters.

The VARCHAR2 and CHAR types support two methods of specifying lengths:

In bytes: VARCHAR2(10 byte). This will support up to 10 bytes of data, which could be as few as two characters in a multi-byte character sets. In characters: VARCHAR2(10 char). This will support to up 10 characters of data, which could be as much as 40 bytes of information.

And it appears that the default is bytes!

This seems to be creating confusing for more than just us:

Oracle varchar2 - bytes or chars

So if you're coming from an Oracle world, you might assume this is true everywhere. And if you're coming from a SQL Server world, you might not realize this is the case!

In SQL Server

The thing that confuses me is that UTF-8 unicode characters can take up to 6 bytes, and many take as few as 1 byte! And yet, the docs say it each character takes exactly two bytes.

So really... How many bytes does one Unicode character take?

Answer: SQL Server is using UNICODE UCS-2, which

uses a single code value (defined as one or more numbers representing a code point) between 0 and 65,535 for each character, and allows exactly two bytes (one 16-bit word) to represent that value.

Which explains why SQL Server can have a specific amount of space the character string will take based on the length. ALL characters take two bytes in an nvarchar column!

Upvotes: 2

T.S.
T.S.

Reputation: 19394

The correct way to do is to check length

if (myString.Length > 100)
{
    MessageBox.Show("String too long");
    return; 
}

.NET string is already Unicode and can hold whatever characters you have. Contrary, varchar datatype is just and ASCII-type and will not support Unicode. What you need to do is to declare db column as nvarchar, and everything will work smooth. Don't do anything crazy.

Upvotes: 0

John Wu
John Wu

Reputation: 52290

Use GetByteCount for the appropriate encoder-- in this case, ASCII for VarChar and Unicode for NVarChar).

    var s = "This is a string";
    var len1 = s.Length;
    var len2 = System.Text.Encoding.Unicode.GetByteCount(s);
    var len3 = System.Text.Encoding.ASCII.GetByteCount(s);
    Console.WriteLine("'{0}' has {1} characters and is {2} bytes with Unicode encoding and {3} bytes with ASCII encoding.", s, len1, len2, len3);

Output:

'This is a string' has 16 characters and is 32 bytes with Unicode encoding and 16 bytes with ASCII encoding.

Upvotes: 7

Related Questions