MoonKnight
MoonKnight

Reputation: 23833

Updating row columns from a comma-separated column from the same table

I have the following table definition

CREATE TABLE _Table 
(
    [Pat] NVARCHAR(8), 
    [Codes] NVARCHAR(50), 
    [C1] NVARCHAR(6), 
    [C2] NVARCHAR(6), 
    [C3] NVARCHAR(6), 
    [C4] NVARCHAR(6), 
    [C5] NVARCHAR(6)
);
GO

INSERT INTO _Table ([Pat], [Codes], [C1], [C2], [C3], [C4], [C5])
VALUES
    ('Pat1', 'U212,Y973,Y982', null, null, null, null, null),
    ('Pat2', 'M653', null, null, null, null, null), 
    ('Pat3', 'U212,Y973,Y983,Z924,Z926', null, null, null, null, null);
GO  

SQL Fiddle here.

Now, I would like to split the codes for each row and populate the Cn columns so we end up with

Pat     Codes                       C1      C2      C3      C4      C5
Pat1    'U212,Y973,Y982'            U212    Y973    Y982    NULL    NULL
Pat2    'M653'                      M653    NULL    NULL    NULL    NULL
Pat3    'U212,Y973,Y983,Z924,Z926'  U212    Y973    Y983    Z924    Z926

I am looking at dynamic SQL but is there a better way...

I have started down the CTE route, but I am weak here. I am essentially looping, removing the first comma separated code and using left to get that code and selected it as C1.

;WITH tmp([Pat], [Codes], [C1], [C2], [C3], [C4], [C5]) AS
(
    SELECT
        Pat,
        STUFF(Codes, 1, CHARINDEX(',', Codes + ','), ''), 
        LEFT(Codes, CHARINDEX(',', Codes + ',') - 1), 
        [C2], 
        [C3], 
        [C4], 
        [C5]
    FROM _Table 
    UNION all
    SELECT
        Pat,
        STUFF(Codes, 1, CHARINDEX(',', Codes + ','), ''), 
        LEFT(Codes, CHARINDEX(',', Codes + ',') - 1), 
        [C2], 
        [C3], 
        [C4], 
        [C5]
    FROM _Table 
    WHERE
        Codes > ''
)
SELECT Pat, Codes,  [C1], [C2], [C3], [C4], [C5] 
FROM tmp 
ORDER BY Pat

This works for one code, but how do I do all 5? Note, in practice this could increase to N codes.

Upvotes: 6

Views: 332

Answers (6)

Gottfried Lesigang
Gottfried Lesigang

Reputation: 67291

My suggestion uses an updatable CTE and XML as splitter. XML allows to address each fragment by its position:

WITH updatableCTE AS
(
    SELECT *
          ,CAST('<x>' + REPLACE(Codes,',','</x><x>') + '</x>' AS XML) AS CastedToXml
    FROM _Table
)
UPDATE updatableCTE SET  C1=CastedToXml.value('/x[1]','nvarchar(6)')
                        ,C2=CastedToXml.value('/x[2]','nvarchar(6)')  
                        ,C3=CastedToXml.value('/x[3]','nvarchar(6)')  
                        ,C4=CastedToXml.value('/x[4]','nvarchar(6)')  
                        ,C5=CastedToXml.value('/x[5]','nvarchar(6)'); 

SELECT * FROM _Table

This is very easy to scale to any needed number of fragments.

Upvotes: 1

Zohar Peled
Zohar Peled

Reputation: 82474

Though I really like Alan Burstein's answer, another option involves using Jeff Moden's DelimitedSplit8K, conditional aggregation and stacking a couple of common table expressions to get the values you want:

;WITH CTE1 AS
(
    SELECT Pat, Codes, ItemNumber, Item
    FROM _Table
    CROSS APPLY [dbo].[DelimitedSplit8K](Codes, ',')
), CTE2 AS
(
    SELECT  Pat, 
            Codes, 
            MAX(CASE WHEN ItemNumber = 1 THEN Item END) As V1, 
            MAX(CASE WHEN ItemNumber = 2 THEN Item END) As V2, 
            MAX(CASE WHEN ItemNumber = 3 THEN Item END) As V3, 
            MAX(CASE WHEN ItemNumber = 4 THEN Item END) As V4, 
            MAX(CASE WHEN ItemNumber = 5 THEN Item END) As V5
    FROM CTE1
    GROUP BY [Pat], [Codes]
)

UPDATE t
SET C1 = V1,
    C2 = V2,
    C3 = V3,
    C4 = V4,
    C5 = V5
FROM _Table t
JOIN CTE2 ON t.Pat = CTE2.Pat

You can see an online sample or rextester.

Upvotes: 1

Alan Burstein
Alan Burstein

Reputation: 7918

If I understand the requirement correctly this is extremely simple. No splitting or other type of function, Dynamic SQL, recursive CTEs, PIVOTING or any other skulduggery is necessary.

To perform the "split" you can use CROSS APPLY like so:

SELECT 
  Pat,
  Codes,
  C1 = SUBSTRING(Codes,1,ISNULL(d1.d-1,8000)),
  C2 = SUBSTRING(Codes,d1.d+1, d2.d-d1.d-1),
  C3 = SUBSTRING(Codes,d2.d+1, d3.d-d2.d-1),
  C4 = SUBSTRING(Codes,d3.d+1, d4.d-d3.d-1),
  C5 = SUBSTRING(Codes,d4.d+1, 8000)
FROM _Table
CROSS APPLY (VALUES (NULLIF(CHARINDEX(',',Codes),0)))        d1(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(',',Codes,d1.d+1),0))) d2(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(',',Codes,d2.d+1),0))) d3(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(',',Codes,d3.d+1),0))) d4(d);

Returns

Pat      Codes                         C1    C2     C3    C4    C5    
-------- ----------------------------- ----- ------ ----- ----- ------
Pat1     U212,Y973,Y982                U212  Y973   NULL  NULL  NULL
Pat2     M653                          M653  NULL   NULL  NULL  NULL
Pat3     U212,Y973,Y983,Z924,Z926      U212  Y973   Y983  Z924  Z926

Note the super-simple and utltra-efficient execution plan:

enter image description here

You can simplify this even further if the codes are always four character like so:

SELECT
  Pat,
  Codes,
  C1 = NULLIF(SUBSTRING(Codes,1,4),''),
  C2 = NULLIF(SUBSTRING(Codes,6,4),''),
  C3 = NULLIF(SUBSTRING(Codes,11,4),''),
  C4 = NULLIF(SUBSTRING(Codes,16,4),''),
  C2 = NULLIF(SUBSTRING(Codes,21,4),'')
FROM _Table;

To perform the update you would do this for first solution:

UPDATE _Table
SET 
  C1 = SUBSTRING(Codes,1,ISNULL(d1.d-1,8000)),
  C2 = SUBSTRING(Codes,d1.d+1, d2.d-d1.d-1),
  C3 = SUBSTRING(Codes,d2.d+1, d3.d-d2.d-1),
  C4 = SUBSTRING(Codes,d3.d+1, d4.d-d3.d-1),
  C5 = SUBSTRING(Codes,d4.d+1, 8000)
FROM _Table
CROSS APPLY (VALUES (NULLIF(CHARINDEX(',',Codes),0)))        d1(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(',',Codes,d1.d+1),0))) d2(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(',',Codes,d2.d+1),0))) d3(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(',',Codes,d3.d+1),0))) d4(d);

If, again, the codes are only four characters long then the update is so easy it feels like cheating:

UPDATE _Table
SET C1 = NULLIF(SUBSTRING(Codes,1,4),''),
    C2 = NULLIF(SUBSTRING(Codes,6,4),''),
    C3 = NULLIF(SUBSTRING(Codes,11,4),''),
    C4 = NULLIF(SUBSTRING(Codes,16,4),''),
    C5 = NULLIF(SUBSTRING(Codes,21,4),'');

Upvotes: 6

JohnLBevan
JohnLBevan

Reputation: 24410

Here's one possible solution: SQL Fiddle

with cte (PAT, CNum, Indx) as 
(
  select PAT
  , 1
  , 1
  from _table

  union all

  select a.PAT
  , b.CNum + 1
  , charindex(',', a.CODES, b.Indx+1)+1
  from _table a
  inner join cte b
  on b.PAT = a.PAT
  where charindex(',', a.CODES, b.Indx+1) > 0
)
select t.PAT
--, t.CODES --include to see the original codes value
, max(case when c1.CNUM = 1 then substring(t.CODES,c1.INDX,coalesce(c2.INDX,LEN(t.CODES)+2)-c1.INDX-1) end) C1
, max(case when c1.CNUM = 2 then substring(t.CODES,c1.INDX,coalesce(c2.INDX,LEN(t.CODES)+2)-c1.INDX-1) end) C2
, max(case when c1.CNUM = 3 then substring(t.CODES,c1.INDX,coalesce(c2.INDX,LEN(t.CODES)+2)-c1.INDX-1) end) C3
, max(case when c1.CNUM = 4 then substring(t.CODES,c1.INDX,coalesce(c2.INDX,LEN(t.CODES)+2)-c1.INDX-1) end) C4
, max(case when c1.CNUM = 5 then substring(t.CODES,c1.INDX,coalesce(c2.INDX,LEN(t.CODES)+2)-c1.INDX-1) end) C5
from _Table t
left outer join cte c1
on c1.PAT = t.PAT
left outer join cte c2
on c2.PAT = c1.PAT
and c2.CNum = c1.CNum + 1
group by t.PAT
--, t.CODES --include to see the original codes value

This uses a recursive CTE to fetch the starting positions of each of the values in CODES. The first position is taken as 1; subsequent positions are the char index of the next comma plus one (so we get the character after the comma). We record these positions in value INDX.

We also have CNum in our recursive CTE to record which (C#) field the related record will be related to; this is a simple counter for each result returned.

i.e. For the example data on the SQL Fiddle link, the result of our CTE looks like this:

PAT   | CNum    |   Indx
------+---------+-------
Pat1  |    1    |    1
Pat1  |    2    |    6
Pat1  |    3    |   11
Pat1  |    4    |   16
Pat1  |    5    |   21
Pat2  |    1    |    1
Pat3  |    1    |    1
Pat3  |    2    |    6
Pat3  |    3    |   11
Pat3  |    4    |   16
Pat3  |    5    |   21
Pat3  |    6    |   26
Pat3  |    7    |   31
Pat3  |    8    |   36
Pat3  |    9    |   41
Pat3  |   10    |   46
Pat4  |    1    |   1

We then collapse this down using our group by expression; essentially doing a pivot, putting each CNum value against its related column.

The max is just to ensure that we ignore all the null values / taken the only result with a value for the given CNum.

The code case when c1.CNUM = 1 then substring(t.CODES,c1.INDX,coalesce(c2.INDX,LEN(t.CODES)+2)-c1.INDX-1) end performs a substring between the first character of the current item, taking its length up to 1 before the next character; or if there is no next character the length of the CODES string.


Update

Here's an updated version which makes use of the pivot function / all other logic is as above: SQL Fiddle

with cte (PAT, CNum, Indx) as 
(
  select PAT
  , 1
  , 1
  from @table
  where CODES != ''
  and CODES is not null

  union all

  select a.PAT
  , b.CNum + 1
  , charindex(',', a.CODES, b.Indx+1)+1
  from @table a
  inner join cte b
  on b.PAT = a.PAT
  where charindex(',', a.CODES, b.Indx+1) > 0
)
select PAT
, CODES
, [1] C1
, [2] C2
, [3] C3
, [4] C4
, [5] C5
from 
(
  select t.PAT
  , t.CODES
  , c1.CNum CNum1
  , substring(t.CODES,c1.INDX,coalesce(c2.INDX,LEN(t.CODES)+2)-c1.INDX-1) value
  from @table t
  left outer join cte c1
  on c1.PAT = t.PAT
  left outer join cte c2
  on c2.PAT = c1.PAT
  and c2.CNum = c1.CNum + 1
) x
pivot
(
  max(value) 
  for CNum1 in ([1],[2],[3],[4],[5])
) pvt
order by PAT

Upvotes: 1

Lukasz Szozda
Lukasz Szozda

Reputation: 175596

I have started down the CTE route, but I am weak here. I am essentially looping, removing the first comma separated code and using left to get that code and selected it as C1.

There are many possible' CSV splitters in SQL Server. I just want to extend your idea with cte (recursive version combined with conditional aggregation):

;WITH tmp([Pat], [Codes],x, lvl) AS
(
  SELECT
   Pat,
   Codes = CAST(STUFF(Codes,1,CHARINDEX(',', Codes + ','), '')AS NVARCHAR(MAX)), 
   x = CAST(LEFT(Codes, CHARINDEX(',', Codes + ',') - 1) AS NVARCHAR(MAX)),
   lvl = 1
  FROM _Table
  -- WHERE c1 IS NULL AND c2 IS NULL AND ...
  -- to avoid recalculating the same rows if run many times
  UNION ALL
  SELECT  Pat,
    Codes = STUFF(Codes,1,CHARINDEX(',', Codes + ','), ''), 
    x= LEFT(Codes, CHARINDEX(',', Codes + ',') - 1),
    lvl = lvl+1
  FROM tmp
  WHERE Codes > ''
)
SELECT Pat
    ,c1 = MAX(IIF(lvl=1, x, NULL))
    ,c2 = MAX(IIF(lvl=2, x, NULL))
    ,c3 = MAX(IIF(lvl=3, x, NULL))
    ,c4 = MAX(IIF(lvl=4, x, NULL))
    ,c5 = MAX(IIF(lvl=5, x, NULL))
    -- add more if necessary
FROM tmp 
GROUP BY Pat
-- OPTION (MAXRECURSION 0);

DBFiddle Demo


And UPDATE:

;WITH tmp([Pat], [Codes],x, lvl) AS
(
    SELECT
        Pat,
        Codes=CAST(STUFF(Codes,1,CHARINDEX(',',Codes+','),'')AS NVARCHAR(MAX)), 
        x = CAST(LEFT(Codes, CHARINDEX(',', Codes + ',') - 1) AS NVARCHAR(MAX)),
        lvl = 1
    FROM _Table
    UNION ALL
    SELECT  Pat,
        Codes = STUFF(Codes, 1, CHARINDEX(',', Codes + ','), ''), 
        x= LEFT(Codes, CHARINDEX(',', Codes + ',') - 1),
        lvl = lvl+1
    FROM tmp
    WHERE Codes > ''
), cte2 AS (
SELECT Pat
    ,c1 = MAX(IIF(lvl=1, x, NULL))
    ,c2 = MAX(IIF(lvl=2, x, NULL))
    ,c3 = MAX(IIF(lvl=3, x, NULL))
    ,c4 = MAX(IIF(lvl=4, x, NULL))
    ,c5 = MAX(IIF(lvl=5, x, NULL))
FROM tmp 
GROUP BY Pat
)
UPDATE _Table
SET c1 = c.c1
   ,c2 = c.c2
   ,c3 = c.c3
   ,c4 = c.c4
   ,c5 = c.c5
FROM _Table t
JOIN cte2 c
  ON t.Pat = c.Pat
 OPTION (MAXRECURSION 0);

DBFiddle Demo

Output:

╔══════╦══════════════════════════╦══════╦══════╦══════╦══════╦══════╗
║ Pat  ║          Codes           ║  C1  ║  C2  ║  C3  ║  C4  ║  C5  ║
╠══════╬══════════════════════════╬══════╬══════╬══════╬══════╬══════╣
║ Pat1 ║ U212,Y973,Y982           ║ U212 ║ Y973 ║ Y982 ║ null ║ null ║
║ Pat2 ║ M653                     ║ M653 ║ null ║ null ║ null ║ null ║
║ Pat3 ║ U212,Y973,Y983,Z924,Z926 ║ U212 ║ Y973 ║ Y983 ║ Z924 ║ Z926 ║
╚══════╩══════════════════════════╩══════╩══════╩══════╩══════╩══════╝

Final thought: The correct way is to normalize schema.

Upvotes: 2

user7396598
user7396598

Reputation: 1289

What MatBailie is trying to get you to see is that having a comma separated list as a column, or having C1 - C(n) columns is a bad idea.

Better to make a second table. It will have a foreign key to the parent table (_Table in your example).

To do this you need to do a couple of things.

First, add an identity column to _Table. This will give you a unique key to use in other tables. Identity is an easy solution without knowing the full scope of your data. You could make it a different, unique column.

Second make a child table, for example PatCodes. It's columns should included:

Id int not null (identity)
_TableId int not null (fk to _Table)
Code varchar(xx) not null 

(optional that might be nice to have)
Description varchar(xxx) null 
DateUpdated datetime null
Active bit not null default 1 

In this way you can have zero-to-many codes for each entry in _Table. You will not need to know the maximum (which could even change in the future) number of Cx columns to add to _Table. You will not be flooded with a large number of NULL column values in _Table. You have a ton of flexibility for managing updates to changing codes for a given _Table entry. I could go on and on about the benefits of normalizing this correctly from the start.

Changing your code to populate this table instead of an unknown number of columns on _Table is also easy to do.

Just do this now, before you end up with a ton of nasty data in the database and/or an application with loads of clutter around your _Table objects, it will save you much time and effort.

Upvotes: 0

Related Questions