JonBrave
JonBrave

Reputation: 4280

Efficiency of varchar(max) in T-SQL code

I have a question about the "efficiency" (speed, memory usage) of (n)varchar(max) versus (n)varchar(<explicit-size>) in T-SQL code (stored procedures, functions). I am not asking about its usage in column definitions.

In existing code, I am noticing lots of occasions along the lines of, say:

CREATE TABLE [table] (
    [column] nvarchar(1000)
)

DECLARE @var nvarchar(2000)
SELECT @var = [column] FROM  TABLE WHERE ...
SET @var = @var + @somethingelse + @anotherthing ...
SET @var = REPLACE(@var, N'search', N'lotstoreplacewith')
...

The gist being (the above is only an example) I am ultimately building longer strings off limited-size string columns. It is difficult/inconvenient to know and maintain just how big these strings could become.

So, it would be simpler for me if I changed to declaring these T-SQL variables as nvarchar(max). Then I do not have to worry about what maximum size they could reach. They would be similar to a programming language's "unlimited" string size.

Does anybody know how "efficiently" SQL Server handles code which manipulates variables declared (n)varchar(max)? For example, does it reserve a huge amount of space for such a variable (I'm hoping it handles it purely dynamically), or for CAST/CONVERT(), or do string functions using it suffer some performance penalty? [If it is relevant, I have to support SQL Server 2008 R2 onward.]

[EDIT: It has been suggested that my question is a duplicate of Are there any disadvantages to always using nvarchar(MAX)?. However, all but a single post there refer to varchar(max) in column definitions, and my question explicitly states I am asking about T-SQL code (variables etc.) performance. I am posting an answer (because it is large) to my question below, which draws on that post and extends it with some new information.]

Upvotes: 1

Views: 2001

Answers (1)

JonBrave
JonBrave

Reputation: 4280

In Are there any disadvantages to always using nvarchar(MAX)? there is one answer https://stackoverflow.com/a/26120578/489865 which relates to T-SQL variables performance and not column definitions.

The gist of that post is to run SELECT @var='ABC' queries returning 1,000,000 rows, assigning to variables defined as nvarchar(<size>) versus nvarchar(max).

Under SQL Server 2008 R2, I concur with the poster's findings that nvarchar(max) is 4 times slower than nvarchar(<size>) in the example. Interestingly, if it is changed to make the assignment do slightly more work as in:

SET NOCOUNT ON;

--===== Test Variable Assignment 1,000,000 times using NVARCHAR(300)
DECLARE @SomeString NVARCHAR(300),
        @StartTime DATETIME
;
 SELECT @startTime = GETDATE()
;
 SELECT TOP 1000000
        @SomeString = 'ABC' + ac1.[name] + ac2.[name]
   FROM master.sys.all_columns ac1,
        master.sys.all_columns ac2
;
 SELECT Duration    = DATEDIFF(ms,@StartTime,GETDATE())
;
GO
--===== Test Variable Assignment 1,000,000 times using NVARCHAR(4000)
DECLARE @SomeString NVARCHAR(4000),
        @StartTime DATETIME
;
 SELECT @startTime = GETDATE()
;
 SELECT TOP 1000000
        @SomeString = 'ABC' + ac1.[name] + ac2.[name]
   FROM master.sys.all_columns ac1,
        master.sys.all_columns ac2
;
 SELECT Duration    = DATEDIFF(ms,@StartTime,GETDATE())
;
GO
--===== Test Variable Assignment 1,000,000 times using VARCHAR(MAX)
DECLARE @SomeString NVARCHAR(MAX),
        @StartTime DATETIME
;
 SELECT @startTime = GETDATE()
;
 SELECT TOP 1000000
        @SomeString = 'ABC' + ac1.[name] + ac2.[name]
   FROM master.sys.all_columns ac1,
        master.sys.all_columns ac2
;
 SELECT Duration    = DATEDIFF(ms,@StartTime,GETDATE())
;
GO

(note the + ac1.[name] + ac2.[name]) then the nvarchar(max) takes only twice as long. So in practice performance hit for nvarchar(max) may be better than at first seems.

Upvotes: 1

Related Questions