user1002479
user1002479

Reputation: 235

Combine multiple rows into list for multiple columns

I'm aware that the "combine multiple rows into list" question has been answered a million times, and here's a reference to an awesome article: Concatenating row values in transact sql

I have a need to combine multiple rows into lists for multiple columns at the same time

 ID | Col1 | Col2       ID | Col1 | Col2 
------------------  =>  ------------------
  1    A     X           1    A     X    
  2    B     Y           2    B,C   Y,Z
  2    C     Z

I tried to use the xml method, but this has proven to be very slow over large tables

SELECT DISTINCT
    [ID],
    [Col1] = STUFF((SELECT ',' + t2.[Col1]
                    FROM #Table t2
                    WHERE t2.ID = t.ID
                    FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)'),1,1,''),
    [Col2] = STUFF((SELECT ',' + t2.[Col2]
                    FROM #Table t2
                    WHERE t2.ID = t.ID
                    FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)'),1,1,''),
FROM #Table t

My current solution is to use a stored procedure that builds each ID row separately. I'm wondering if there's another approach I could use (other than using a loop)

For each column, rank the rows to combine (partition by the key column)

End up with a table like
ID  | Col1 | Col2 | Col1Rank | Col2Rank
1      A      X        1          1
2      B      Y        1          1
2      C      Z        2          2

Create a new table containing top rank columns for each ID
ID  | Col1Comb | Col2Comb
1       A           X
2       B           Y

Loop through each remaining rank in increasing order (in this case 1 iteration)
for irank = 0; irank <= 1; irank++
    update n set
       n.col1Comb = n.Col1Comb + ',' + o.Col1,  -- so append the rank 2 items
       n.col2comb = n.Col2Comb + ',' + o.Col2   -- if they are not null
    from #newtable n
    join #oldtable o
       on o.ID = n.ID
    where o.col1rank = irank or o.col2rank = irank

Upvotes: 4

Views: 6227

Answers (3)

Devart
Devart

Reputation: 121902

Try this one -

Query1:

DECLARE @temp TABLE
(
      ID INT
    , Col1 VARCHAR(30)
    , Col2 VARCHAR(30)
)

INSERT INTO @temp (ID, Col1, Col2)
VALUES 
    (1, 'A', 'X'),
    (2, 'B', 'Y'),
    (2, 'C', 'Z')

SELECT
      r.ID
    , Col1 = STUFF(REPLACE(REPLACE(CAST(d.x.query('/t1/a') AS VARCHAR(MAX)), '<a>', ','), '</a>', ''), 1, 1, '')
    , Col2 = STUFF(REPLACE(REPLACE(CAST(d.x.query('/t2/a') AS VARCHAR(MAX)), '<a>', ','), '</a>', ''), 1, 1, '')
FROM (
    SELECT DISTINCT ID
    FROM @temp
) r
OUTER APPLY (
    SELECT x = CAST((
        SELECT 
                [t1/a] = t2.Col1
              , [t2/a] = t2.Col2
        FROM @temp t2
        WHERE r.ID = t2.ID
        FOR XML PATH('')
    ) AS XML)
) d

Query 2:

SELECT
      r.ID
    , Col1 = STUFF(REPLACE(CAST(d.x.query('for $a in /a return xs:string($a)') AS VARCHAR(MAX)), ' ,', ','), 1, 1, '') 
    , Col2 = STUFF(REPLACE(CAST(d.x.query('for $b in /b return xs:string($b)') AS VARCHAR(MAX)), ' ,', ','), 1, 1, '') 
FROM (
    SELECT DISTINCT ID
    FROM @temp
) r
OUTER APPLY (
    SELECT x = CAST((
        SELECT 
                [a] = ',' + t2.Col1
              , [b] = ',' + t2.Col2
        FROM @temp t2
        WHERE r.ID = t2.ID
        FOR XML PATH('')
    ) AS XML)
) d

Output:

ID          Col1       Col2
----------- ---------- ----------
1           A          X
2           B,C        Y,Z

Upvotes: 1

John Dewey
John Dewey

Reputation: 7093

A CTE trick can be used where you update the CTE.

Method 1: a new parallel table to which the data is copied and then concatenated:

CREATE TABLE #Table1(ID INT, Col1 VARCHAR(1), Col2 VARCHAR(1), RowID INT IDENTITY(1,1));
CREATE TABLE #Table1Concat(ID INT, Col3 VARCHAR(MAX), Col4 VARCHAR(MAX), RowID INT);
GO

INSERT #Table1 VALUES(1,'A','X'), (2,'B','Y'), (2,'C','Z');
GO
INSERT #Table1Concat
SELECT * FROM #Table1;
GO
DECLARE @Cat1 VARCHAR(MAX) = '';
DECLARE @Cat2 VARCHAR(MAX) = '';
; WITH CTE AS (
    SELECT TOP 2147483647 t1.*, t2.Col3, t2.Col4, r = ROW_NUMBER()OVER(PARTITION BY t1.ID ORDER BY t1.Col1, t1.Col2)
    FROM #Table1 t1
    JOIN #Table1Concat t2 ON t1.RowID = t2.RowID
    ORDER BY t1.ID, t1.Col1, t1.Col2
)
UPDATE CTE
SET @Cat1 = Col3 = CASE r WHEN 1 THEN ISNULL(Col1,'') ELSE @Cat1 + ',' + Col1 END
, @Cat2 = Col4 = CASE r WHEN 1 THEN ISNULL(Col2,'') ELSE @Cat2 + ',' + Col2 END;
GO

SELECT ID, Col3 = MAX(Col3) 
, Col4 = MAX(Col4)
FROM #Table1Concat
GROUP BY ID

Method 2: Add the concatenation columns directly to the original table and concatenate the new columns:

CREATE TABLE #Table1(ID INT, Col1 VARCHAR(1), Col2 VARCHAR(1), Col1Cat VARCHAR(MAX), Col2Cat VARCHAR(MAX));
GO

INSERT #Table1(ID,Col1,Col2) VALUES(1,'A','X'), (2,'B','Y'), (2,'C','Z');
GO

DECLARE @Cat1 VARCHAR(MAX) = '';
DECLARE @Cat2 VARCHAR(MAX) = '';
; WITH CTE AS (
    SELECT TOP 2147483647 t1.*, r = ROW_NUMBER()OVER(PARTITION BY t1.ID ORDER BY t1.Col1, t1.Col2)
    FROM #Table1 t1
    ORDER BY t1.ID, t1.Col1, t1.Col2
)
UPDATE CTE
SET @Cat1 = Col1Cat = CASE r WHEN 1 THEN ISNULL(Col1,'') ELSE @Cat1 + ',' + Col1 END
, @Cat2 = Col2Cat = CASE r WHEN 1 THEN ISNULL(Col2,'') ELSE @Cat2 + ',' + Col2 END;
GO

SELECT ID, Col1Cat = MAX(Col1Cat) 
, Col2Cat = MAX(Col2Cat)
FROM #Table1
GROUP BY ID;
GO

Upvotes: 3

Solomon Rutzky
Solomon Rutzky

Reputation: 48776

One solution, one that is at least syntactically straight-forward, is to use a User-Defined Aggregate to "Join" the values together. This does require SQLCLR and while some folks are reluctant to enable it, it does provide for a set-based approach that does not need to re-query the base table per each column. Joining is the opposite of Splitting and will create a comma-separated list of what was individual rows.

Below is a simple example that uses the SQL# (SQLsharp) library which comes with a User-Defined Aggregate named Agg_Join() that does exactly what is being asked for here. You can download the Free version of SQL# from http://www.SQLsharp.com/ and the example SELECTs from a standard system view. (And to be fair, I am the author of SQL# but this function is available for free).

SELECT sc.[object_id],
       OBJECT_NAME(sc.[object_id]) AS [ObjectName],
       SQL#.Agg_Join(sc.name) AS [ColumnNames],
       SQL#.Agg_Join(DISTINCT sc.system_type_id) AS [DataTypes]
FROM sys.columns sc
GROUP BY sc.[object_id]

I recommend testing this against your current solution(s) to see which is the fastest for the volume of data you expect to have in at least the next year or two.

Upvotes: 0

Related Questions