Reputation: 537
I have data that was sent to me, and I need to normalize it. The data is in a sql table, but each row has multiple multi value columns. An example is the following:
ID fname lname projects projdates
1 John Doe projA;projB;projC 20150701;20150801;20150901
2 Jane Smith projD;;projC 20150701;;20150902
3 Lisa Anderson projB;projC 20150801;20150903
4 Nancy Johnson projB;projC;projE 20150601;20150822;20150904
5 Chris Edwards projA 20150905
Needs too look like this:
ID fname lname projects projdates
1 John Doe projA 20150701
1 John Doe projB 20150801
1 John Doe projC 20150901
2 Jane Smith projD 20150701
2 Jane Smith projC 20150902
3 Lisa Anderson projB 20150801
3 Lisa Anderson projC 20150903
4 Nancy Johnson projB 20150601
4 Nancy Johnson projC 20150822
4 Nancy Johnson projE 20150904
5 Chris Edwards projA 20150905
I need to split it into rows for the id, fname, lname, and parsing the projects and proddates into separate records. I have found many posts with split functions and I can get it to work for 1 column, but not 2. When I do 2 columns it permeates the split. ie for John Doe, it gives me records for projA 3 times, once for each of the proddates. I need to coorelate each multivalue project record with only it's respective projdate and not the others.
Any thoughts?
Thanks!
Upvotes: 0
Views: 4548
Reputation: 1
Check this answer (DB: SQL Server 2019)
select name, Location_Type, b.value as spltrows from Location__c A
CROSS APPLY string_split(A.Location_Type,';') b
where Location_Type is not null
ORDER BY 1
In Location_Type column we have multi-values like "Mailing;Shipping;Billing"
Upvotes: 0
Reputation: 158
If you use Jeff Moden's "DelimitedSplit8K" splitter (Which I have renamed here "fDelimitedSplit8K")
(Ref. Figure 21: The Final "New" Splitter Code, Ready for Testing)
to do the heavy lifting for the splits, the rest becomes fairly straightforward, using CROSS APPLY and WHERE to get the proper joining.
IF object_ID (N'tempdb..#tInputData') is not null
DROP TABLE #tInputData
CREATE TABLE #tInputData (
ID INT
PRIMARY KEY CLUSTERED -- Add IDENTITY if ID needs to be set at INSERT time
, FName VARCHAR (30)
, LName VARCHAR (30)
, Projects VARCHAR (4000)
, ProjDates VARCHAR (4000)
)
INSERT INTO #tInputData
( ID, FName, LName, Projects, ProjDates )
VALUES
( 1, 'John', 'Doe' , 'projA;projB;projC' , '20150701;20150801;20150901'),
( 2, 'Jane', 'Smith' , 'projD;;projC' , '20150701;;20150902'),
( 3, 'Lisa', 'Anderson' , 'projB;projC' , '20150801;20150903'),
( 4, 'Nancy', 'Johnson' , 'projB;projC;projE' , '20150601;20150822;20150904'),
( 5, 'Chris', 'Edwards' , 'projA' , '20150905')
SELECT * FROM #tInputData -- Take a look at the INSERT results
; WITH ResultSet AS
(
SELECT
InData.ID
, InData.FName
, InData.LName
, ProjectList.ItemNumber AS ProjectID
, ProjectList.Item AS Project
, DateList.ItemNumber AS DateID
, DateList.Item AS ProjDate
FROM #tInputData AS InData
CROSS APPLY dbo.fDelimitedSplit8K(InData.Projects,';') AS ProjectList
CROSS APPLY dbo.fDelimitedSplit8K(InData.ProjDates,';') AS DateList
WHERE DateList.ItemNumber = ProjectList.ItemNumber -- Links projects and dates in left-to-r1ght order
AND (ProjectList.Item <> '' AND DateList.Item <> '') -- Ignore input lines when both Projects and ProjDates have no value; note that these aren't NULLs.
)
SELECT
ID
, FName
, LName
, Project
, ProjDate
FROM ResultSet
ORDER BY ID, Project
Results in
ID FName LName Project ProjDate
-- ----- -------- ------- --------
1 John Doe projA 20150701
1 John Doe projB 20150801
1 John Doe projC 20150901
2 Jane Smith projC 20150902
2 Jane Smith projD 20150701
3 Lisa Anderson projB 20150801
3 Lisa Anderson projC 20150903
4 Nancy Johnson projB 20150601
4 Nancy Johnson projC 20150822
4 Nancy Johnson projE 20150904
5 Chris Edwards projA 20150905
This algorithm handles Project and Date lists of equal length. Should one list be shorter than the other for a given row, some special attention will be needed to apply the NULL in the proper place.
-- Cleanup
DROP TABLE #tInputData
Upvotes: 1
Reputation: 93
Try with this following query.
SELECT A.ID ,a.fname,a.lname ,a.projects ,
ltrim(Split.a.value('.', 'VARCHAR(100)')) AS projdates
FROM (SELECT ID , fname, lname , projects,
CAST ('' + REPLACE([projdates], ';', '') + '' AS XML) AS String
FROM ) AS A CROSS APPLY String.nodes ('/M') AS Split(a);
Try with this you will get the your expected output.
Thanks.
Upvotes: 0
Reputation: 239784
You don't say what you're expected results are, but this may be a good starting point:
declare @t table (ID int not null,fname varchar(17) not null,lname varchar(15) not null,
projects varchar(76) not null,projdates varchar(310) not null)
insert into @t(ID,fname,lname,projects,projdates) values
(1,'John', 'Doe', 'projA;projB;projC','20150701;20150801;20150901'),
(2,'Jane', 'Smith', 'projD;;projC', '20150701;;20150902' ),
(3,'Lisa', 'Anderson','projB;projC', '20150801;20150903' ),
(4,'Nancy','Johnson', 'projB;projC;projE','20150601;20150822;20150904'),
(5,'Chris','Edwards', 'projA', '20150905' )
;With Numbers as (
select ROW_NUMBER() OVER (ORDER BY Number) n
from master..spt_values
), ProjectPositions as (
select ID,n.n
from @t t
inner join
Numbers n
on SUBSTRING(t.projects,n.n,1) = ';'
union all
select ID,0 from @t
union all
select ID,LEN(projects)+1 from @t
), ProjectsNumbered as (
select *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY n) rn
from ProjectPositions
), ProjectPartitions as (
select n1.ID,n1.n+1 as startat,n2.n as endat,n1.rn
from ProjectsNumbered n1
inner join
ProjectsNumbered n2
on
n1.id = n2.id and
n1.rn = n2.rn -1
), ProDatePositions as (
select ID,n.n
from @t t
inner join
Numbers n
on SUBSTRING(t.projdates,n.n,1) = ';'
union all
select ID,0 from @t
union all
select ID,LEN(projdates)+1 from @t
), ProDateNumbered as (
select *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY n) rn
from ProDatePositions
), ProDatePartitions as (
select n1.ID,n1.n+1 as startat,n2.n as endat,n1.rn
from ProDateNumbered n1
inner join
ProDateNumbered n2
on
n1.id = n2.id and
n1.rn = n2.rn -1
)
select
t.ID,t.fname,t.lname,
SUBSTRING(projects,pp.startat,pp.endat - pp.startat) as project,
SUBSTRING(projdates,pdp.startat,pdp.endat - pdp.startat) as projdate
from
@t t
inner join
ProjectPartitions pp
on
t.ID = pp.ID
inner join
ProDatePartitions pdp
on
t.ID = pdp.ID and
pp.rn = pdp.rn
Results:
ID fname lname project projdate
----------- ----------------- --------------- ----------- ----------
1 John Doe projA 20150701
1 John Doe projB 20150801
1 John Doe projC 20150901
2 Jane Smith projD 20150701
2 Jane Smith
2 Jane Smith projC 20150902
3 Lisa Anderson projB 20150801
3 Lisa Anderson projC 20150903
4 Nancy Johnson projB 20150601
4 Nancy Johnson projC 20150822
4 Nancy Johnson projE 20150904
5 Chris Edwards projA 20150905
(It's unclear what you wanted to do for the "empty" project for ID
2)
How it works - we dummy up a Numbers
table using ROW_NUMBER()
- we're querying an undocumented table in master
but we're not using any actual values from the table - it's just known to have a good number of rows. If you have a real numbers table, you can skip that CTE.
We then do the same operations twice - we join the numbers table to our data table and use it to find the positions of ;
characters within the string we want to split. We also create a pair of dummy results for position 0 (before the start of the string) and for 1 position past the end of the string. This defines ProjectPositions
and ProDatePositions
We numbers these positions using another ROW_NUMBER()
(ProjectNumbered
, ProDateNumbered
and then use that information to join successive rows together (ProjectPartitions
, ProDatePartitions
). Then end result is that we have computed where we should extract substrings from both strings.
Finally, we join these "paritition" CTEs back to the original data table, and we use row numbers to ensure that we align the partitioning information from the two independent strings.
Upvotes: 0