missscripty
missscripty

Reputation: 537

SQL Split Multiple Multivalue Columns into Rows

I have data that was sent to me, and I need to normalize it. The data is in a sql table, but each row has multiple multi value columns. An example is the following:

ID  fname   lname       projects           projdates
1   John    Doe         projA;projB;projC  20150701;20150801;20150901
2   Jane    Smith       projD;;projC       20150701;;20150902
3   Lisa    Anderson    projB;projC        20150801;20150903
4   Nancy   Johnson     projB;projC;projE  20150601;20150822;20150904
5   Chris   Edwards     projA              20150905

Needs too look like this:

ID  fname   lname      projects projdates
1   John    Doe          projA  20150701
1   John    Doe          projB  20150801
1   John    Doe          projC  20150901
2   Jane    Smith        projD  20150701
2   Jane    Smith        projC  20150902
3   Lisa    Anderson     projB  20150801
3   Lisa    Anderson     projC  20150903
4   Nancy   Johnson      projB  20150601
4   Nancy   Johnson      projC  20150822
4   Nancy   Johnson      projE  20150904
5   Chris   Edwards      projA  20150905

I need to split it into rows for the id, fname, lname, and parsing the projects and proddates into separate records. I have found many posts with split functions and I can get it to work for 1 column, but not 2. When I do 2 columns it permeates the split. ie for John Doe, it gives me records for projA 3 times, once for each of the proddates. I need to coorelate each multivalue project record with only it's respective projdate and not the others.

Any thoughts?

Thanks!

Upvotes: 0

Views: 4548

Answers (4)

senthil
senthil

Reputation: 1

Check this answer (DB: SQL Server 2019)

select  name, Location_Type, b.value as spltrows from Location__c  A
CROSS APPLY string_split(A.Location_Type,';') b
where Location_Type is not null
ORDER BY 1

In Location_Type column we have multi-values like "Mailing;Shipping;Billing"

Upvotes: 0

Jim the Frayed
Jim the Frayed

Reputation: 158

If you use Jeff Moden's "DelimitedSplit8K" splitter (Which I have renamed here "fDelimitedSplit8K") (Ref. Figure 21: The Final "New" Splitter Code, Ready for Testing)
to do the heavy lifting for the splits, the rest becomes fairly straightforward, using CROSS APPLY and WHERE to get the proper joining.

IF object_ID (N'tempdb..#tInputData') is not null 
   DROP TABLE #tInputData

CREATE TABLE #tInputData (
     ID        INT 
        PRIMARY KEY CLUSTERED  -- Add IDENTITY if ID needs to be set at INSERT time
   , FName     VARCHAR (30)
   , LName     VARCHAR (30)
   , Projects  VARCHAR (4000)
   , ProjDates VARCHAR (4000)
)

INSERT INTO #tInputData
         ( ID, FName, LName, Projects, ProjDates )
VALUES
   ( 1, 'John',  'Doe'      , 'projA;projB;projC' , '20150701;20150801;20150901'),
   ( 2, 'Jane',  'Smith'    , 'projD;;projC'      , '20150701;;20150902'),
   ( 3, 'Lisa',  'Anderson' , 'projB;projC'       , '20150801;20150903'),
   ( 4, 'Nancy', 'Johnson'  , 'projB;projC;projE' , '20150601;20150822;20150904'),
   ( 5, 'Chris', 'Edwards'  , 'projA'             , '20150905')

SELECT * FROM #tInputData  -- Take a look at the INSERT results

; WITH ResultSet  AS 
(
   SELECT 
        InData.ID
      , InData.FName
      , InData.LName
      , ProjectList.ItemNumber AS ProjectID
      , ProjectList.Item AS Project
      , DateList.ItemNumber AS DateID
      , DateList.Item AS ProjDate
   FROM #tInputData AS InData
   CROSS APPLY dbo.fDelimitedSplit8K(InData.Projects,';') AS ProjectList
   CROSS APPLY dbo.fDelimitedSplit8K(InData.ProjDates,';') AS DateList
   WHERE DateList.ItemNumber = ProjectList.ItemNumber  -- Links projects and dates in left-to-r1ght order
   AND (ProjectList.Item <> '' AND DateList.Item <> '') -- Ignore input lines when both Projects and ProjDates have no value; note that these aren't NULLs.
)
SELECT 
      ID
    , FName
    , LName
    , Project
    , ProjDate 
FROM ResultSet
ORDER BY ID, Project

Results in

ID  FName  LName     Project  ProjDate  
--  -----  --------  -------  --------  
 1  John   Doe       projA    20150701  
 1  John   Doe       projB    20150801  
 1  John   Doe       projC    20150901  
 2  Jane   Smith     projC    20150902  
 2  Jane   Smith     projD    20150701  
 3  Lisa   Anderson  projB    20150801  
 3  Lisa   Anderson  projC    20150903  
 4  Nancy  Johnson   projB    20150601  
 4  Nancy  Johnson   projC    20150822  
 4  Nancy  Johnson   projE    20150904  
 5  Chris  Edwards   projA    20150905  

This algorithm handles Project and Date lists of equal length. Should one list be shorter than the other for a given row, some special attention will be needed to apply the NULL in the proper place.

-- Cleanup
DROP TABLE #tInputData

Upvotes: 1

Phani
Phani

Reputation: 93

Try with this following query.

SELECT A.ID ,a.fname,a.lname ,a.projects , ltrim(Split.a.value('.', 'VARCHAR(100)')) AS projdates
FROM (SELECT ID , fname, lname , projects, CAST ('' + REPLACE([projdates], ';', '') + '' AS XML) AS String
FROM ) AS A CROSS APPLY String.nodes ('/M') AS Split(a);

Try with this you will get the your expected output.

Thanks.

Upvotes: 0

Damien_The_Unbeliever
Damien_The_Unbeliever

Reputation: 239784

You don't say what you're expected results are, but this may be a good starting point:

declare @t table (ID int not null,fname varchar(17) not null,lname varchar(15) not null,
projects varchar(76) not null,projdates varchar(310) not null)
insert into @t(ID,fname,lname,projects,projdates) values
(1,'John', 'Doe',     'projA;projB;projC','20150701;20150801;20150901'),
(2,'Jane', 'Smith',   'projD;;projC',     '20150701;;20150902'        ),
(3,'Lisa', 'Anderson','projB;projC',      '20150801;20150903'         ),
(4,'Nancy','Johnson', 'projB;projC;projE','20150601;20150822;20150904'),
(5,'Chris','Edwards', 'projA',            '20150905'                  )

;With Numbers as (
    select ROW_NUMBER() OVER (ORDER BY Number) n
    from master..spt_values
), ProjectPositions as (
    select ID,n.n
    from @t t
        inner join
        Numbers n
            on SUBSTRING(t.projects,n.n,1) = ';'
    union all
    select ID,0 from @t
    union all
    select ID,LEN(projects)+1 from @t
), ProjectsNumbered as (
    select *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY n) rn
    from ProjectPositions
), ProjectPartitions as (
    select n1.ID,n1.n+1 as startat,n2.n as endat,n1.rn
    from ProjectsNumbered n1
            inner join
        ProjectsNumbered n2
            on
                n1.id = n2.id and
                n1.rn = n2.rn -1
), ProDatePositions as (
    select ID,n.n
    from @t t
        inner join
        Numbers n
            on SUBSTRING(t.projdates,n.n,1) = ';'
    union all
    select ID,0 from @t
    union all
    select ID,LEN(projdates)+1 from @t
), ProDateNumbered as (
    select *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY n) rn
    from ProDatePositions
), ProDatePartitions as (
    select n1.ID,n1.n+1 as startat,n2.n as endat,n1.rn
    from ProDateNumbered n1
            inner join
        ProDateNumbered n2
            on
                n1.id = n2.id and
                n1.rn = n2.rn -1
)
select
    t.ID,t.fname,t.lname,
    SUBSTRING(projects,pp.startat,pp.endat - pp.startat) as project,
    SUBSTRING(projdates,pdp.startat,pdp.endat - pdp.startat) as projdate
from
    @t t
        inner join
    ProjectPartitions pp
        on
            t.ID = pp.ID
        inner join
    ProDatePartitions pdp
        on
            t.ID = pdp.ID and
            pp.rn = pdp.rn

Results:

ID          fname             lname           project     projdate
----------- ----------------- --------------- ----------- ----------
1           John              Doe             projA       20150701
1           John              Doe             projB       20150801
1           John              Doe             projC       20150901
2           Jane              Smith           projD       20150701
2           Jane              Smith                       
2           Jane              Smith           projC       20150902
3           Lisa              Anderson        projB       20150801
3           Lisa              Anderson        projC       20150903
4           Nancy             Johnson         projB       20150601
4           Nancy             Johnson         projC       20150822
4           Nancy             Johnson         projE       20150904
5           Chris             Edwards         projA       20150905

(It's unclear what you wanted to do for the "empty" project for ID 2)


How it works - we dummy up a Numbers table using ROW_NUMBER() - we're querying an undocumented table in master but we're not using any actual values from the table - it's just known to have a good number of rows. If you have a real numbers table, you can skip that CTE.

We then do the same operations twice - we join the numbers table to our data table and use it to find the positions of ; characters within the string we want to split. We also create a pair of dummy results for position 0 (before the start of the string) and for 1 position past the end of the string. This defines ProjectPositions and ProDatePositions

We numbers these positions using another ROW_NUMBER() (ProjectNumbered, ProDateNumbered and then use that information to join successive rows together (ProjectPartitions, ProDatePartitions). Then end result is that we have computed where we should extract substrings from both strings.

Finally, we join these "paritition" CTEs back to the original data table, and we use row numbers to ensure that we align the partitioning information from the two independent strings.

Upvotes: 0

Related Questions