Reputation: 5008

Group String Concatenation: which approach, if any, is any more guaranteed?

Itzik Ben-Gan explains in his book 'Inside Microsoft® SQL Server® 2008: T-SQL' how Appoach A works, because of the undocumented behavior of SQL Server in which it performs assignment for each result record from the SELECT.

A well respected collegue and DB guru has suggested that Approach B is guaranteed to work. His argument is based on the recursive nature of COALESCE versus the 'values expansion' method of CAST.

Actually, I have no idea what 'values expansion' refers to (except for the fact that it is casting one value to another) or how it applies to this problem? Perhaps he misunderstood? Yes COALESCE is recursive in a sense, but as far as I can see it is irrelevant and the desired result is produced because of the undocumented behavior of multiple assignment.

Is he correct? Please, no "Use FOR XML PATH instead" answers!

Approach A

DECLARE @output VARCHAR(100);
SET @output = ''; 

SELECT @output = @output + CAST(COL_VCHAR AS VARCHAR(10)) + ';' 
FROM someTable;

Approach B

DECLARE @output VARCHAR(100);
SELECT @output = COALESCE(@output + ', ', '') + COL_VCHAR  
from someTable;

Upvotes: 4

Answers (4)

Aaron Bertrand

Reputation: 280644

I believe they both work because of the same underlying (and undocumented) behavior. I am fairly confident that if you take a case where Itzik has demonstrated that Approach A fails, that Approach B will also fail in the same way and for the same reasons, in spite of your colleague's claims. I don't see how

DECLARE @output VARCHAR(100);
SELECT @output = COALESCE(@output + ', ', '') + COL_VCHAR

Is any different from:

DECLARE @output VARCHAR(100) = '';
SELECT @output = @output + ', ' + COL_VCHAR
-- or SELECT @output += ',' + COL_VCHAR

So what exactly does COALESCE magically introduce? SQL Server isn't going to change its plan because of something you're doing to the output, AFAIK.

I use them all the time for dynamic SQL generation and don't recall ever seeing them fail, and I realize that's not your question. Unfortunately there isn't really a way to prove it unless you know of a case that fails for either approach.

I wrote a blog post about string concatenation a few months ago. It's not completely relevant to your problem but Rob Farley made a comment that may be considered another downside to the COALESCE approach:

https://sqlblog.org/2011/03/08/t-sql-tuesday-16-this-is-not-the-aggregate-youre-looking-for

Upvotes: 4

Damien_The_Unbeliever

Reputation: 239824

Since a conforming SQL implementation could evaluate all of the output rows simultaneously, in parallel, neither is guaranteed to work. That they happen to work, today, is an artifact of the current SQL Server implementation.

Your colleague is incorrect to assert that COALESCE somehow changes the processing model.

I.e. a conforming implementation could effectively hand each potential row in the result set (having evaluated FROM and WHERE) to a separate thread, that then performs whatever processing is required in the SELECT clause (before presumably recombining results in order to assess GROUP BY, HAVING and ORDER BY).

There are no standard requirements governing access to variables during such processing, so each thread could "see" the same initial value of @output (NULL or '', depending on which form you're using), perform it's own update calculation, and assign that result value to @output - the final value of @output might then any of the individual row results - or anything else for that matter.

Upvotes: 2

t-clausen.dk

Reputation: 44346

They work differently, the first example adds an additional character to the end of the output variable, and yes they work because of the same behaviour.

The first example will work if your column had been a numeric or date also. It would however only take the first 10 characters of any value in the column. The second example would not cut off the column.

There is absolutely no recursive nature of COALESCE. COALESCE is a way of replacing null values with replacement values.

EDIT:

I should add that i also prefer the second version, because you don't get that extra semicolon, in the end of output implying that more values will follow.

Upvotes: 0

CD Jorgensen

Reputation: 1401

In your example, there is a way to get different results, and that is if NULL values are introduced into the data set. e.g.:

set nocount on

declare @test table (value int)

insert into @test values (10)
insert into @test values (20)
insert into @test values (null)
insert into @test values (40)
insert into @test values (50)

DECLARE @output VARCHAR(max);

-- Approach A
SET @output = ''; 

SELECT @output = @output + CAST(value AS VARCHAR(10)) + ';' 
FROM @test

print 'Result from A:'
print isnull(@output, '{null}')

-- Approach B
set @output = ''
SELECT @output = COALESCE(@output + ', ', '') + cast(value as varchar(10))
from @test

print 'Result from B:'
print isnull(@output, '{null}')

set nocount off

Approach A will return null, whereas Approach B will return 40, 50

Upvotes: 0

Group String Concatenation: which approach, if any, is any more guaranteed?

Answers (4)

Related Questions