Reputation: 7092

Defining optimal varchar length

I have found that there is no difference between varchar(20) and varchar(40) in sense of performance.

But then I asked myself logically question, why don't create a table where all character columns will be varchar(4000)?

In my opinion, I think there exists some distinction between varchar(3) and varchar(4000) if my column contains only 3 character. But I can't find that distinction.

Upvotes: 2

Answers (4)

M.Ali

Reputation: 69564

As some users has explained VARCHAR(10) will have the same performance as VARCHAR(1000), I think this is not entirely true.

VARCHAR(10) means you can store UPTO 10 character in that field. VARCHAR(1000) means you can store upto 1000 characters in that field.

Now if sql server has to pull data from table which has a column with datatype VARCHAR(1000) but has only 10 charaters long string stored in it. Yes in this case the performance will be the same as if would have been a VARCHAR(10) Column.

But to say VARCHAR(1000) will have same performance as VARCHAR(10) even if VARCHAR(1000) has a lot more than 10 characters stored in it. it isn’t true at all. I see these allowed character lengths as more of a constraints to stop garbage data from coming in.

Upvotes: 0

user275683

Reputation:

There is no difference in storage space just like defined in documentation from Microsoft.

varchar [ ( n | max ) ] Variable-length, non-Unicode string data. n defines the string length and can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size is the actual length of the data entered + 2 bytes. The ISO synonyms for varchar are char varying or character varying. http://technet.microsoft.com/en-us/library/ms176089.aspx

The difference that you have is a potential for bad data. Columns should be defined to allow maximum valid input not just allow anything. I have seen VARCHAR(MAX) columns that resulted in blowing up database because someone ran bad update statement and filled up every row with almost 2 GB worth of data.

Run the following query and look at the execution plan.

DECLARE @tableOne TABLE
(
 id INT
,MySmallField VARCHAR(3)
)

DECLARE @tabletwo TABLE
(
 id INT
,MyBigField VARCHAR(3000)
)

INSERT INTO @tableOne
VALUES( 0, 'aaa' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 4, 'ccc' ),( 5, 'ddd' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 4, 'ccc' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 4, 'ccc' ),( 5, 'ddd' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 4, 'ccc' ),( 5, 'ddd' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 4, 'ccc' ),( 5, 'ddd' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 4, 'ccc' ),( 5, 'ddd' ),( 5, 'ddd' )

INSERT INTO @tableTwo
VALUES( 0, 'aaa' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 4, 'ccc' ),( 5, 'ddd' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 4, 'ccc' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 4, 'ccc' ),( 5, 'ddd' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 4, 'ccc' ),( 5, 'ddd' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 4, 'ccc' ),( 5, 'ddd' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 1, 'aaa' ),( 2, 'aaa' ),( 3, 'bbb' ),( 4, 'ccc' ),( 5, 'ddd' ),( 5, 'ddd' )


SELECT * FROM @tableOne WHERE MySmallField = 'ccc'

SELECT * FROM @tabletwo WHERE MyBigField = 'ccc'

Download SQL Sentry Plan Explorer it is free tool that gives you interface that is more easy to work with than SSMS.

Looking at execution plan for a simple statement you will see that for insert into large column sql server added Compute Scalar function that does the following

Expr1104=CONVERT_IMPLICIT(varchar(3000),[Union1103],0

For this operation data site rose from 784 B to 74 KB. This was done with only 50 rows insert. The bigger the operation the large impact of this will be.

Upvotes: 2

Luaan

Reputation: 63772

The simple answer is "because you're reserving more memory than you need". Really, why stop at varchar size? Why not use a single big varbinary column for everything? Or xml? Why not have all the columns nullable? Why use constraints and foreign keys? The benefit of a schema is exactly that - you have some schema you can rely on. It has its own deficiencies, of course (less flexibility, of course), but that's why this is not the only method of storing data (for example, document databases often don't require a schema at all, which offers more flexibility and doesn't require you to declare your intent twice).

The performance aspect is much more complicated. There's a given page size SQL server uses to store data. That limits you to 8060 B of data per page. Splitting data across pages happens transparently, but it carries a performance penalty. Limiting your maximum row size allows you to judge performance better. http://technet.microsoft.com/en-us/library/ms186981(v=sql.105).aspx says a bit more about the page size issues.

Even more importantly, the length gets back to bite you when considering indices. An index on varchar(3) is just fine. varchar(4000) is just ridiculous. It's going to hurt your performance on lookups and inserts alike, and you really don't want that.

Sparse columns complicate things further. It's also convenient to have a reasonable size on stored procedure (and query) parameters. All those things tend to prevent you from making silly mistakes, not to mention that they help against trolls entering unreasonable data :)

In the end, the idea is "give the engine as much information as possible to make sure you're getting the performance you need".

Upvotes: 1

Up_One

Reputation: 5271

It will not make a difference from storage stand point.
It will only affect you sort performance if you rows are filled-up as it will spill to tempdb OR if your col size varchar columns are declared as 5000 bytes but actually have contents much less than that your query will be allocated memory that it doesn't require which is obviously inefficient and can lead to waits for memory grants.
A good approach is to choose a maximum length value that is reasonable according to the actual data to be stored.

Upvotes: 1

Defining optimal varchar length

Answers (4)

Related Questions