Reputation: 35
I'm a newbie and have had a problem that I searched these forums and found that I needed to have UDF defined function to split a value because SQL Server does not have a built-in function to split a value based on a delimiter. Based on an article (http://www.codeproject.com/Articles/7938/SQL-User-Defined-Function-to-Parse-a-Delimited-Str), I was able to implement the following but it's not working well from a performance perspective. I have about 75000 records and it's taking a long time to run.
The end result is that I need to take all the purchase values (delimitted by |) and put each value with the name into a table. I may at times have 4 values separated by the | or 3,2 none.
Can someone provide some pointers or a different solution? Maybe do this split using a solution like AutoIT off-line? Or in SSIS?
I'm using SQL Server 2012 and SSIS to load the data.
Help!
Thanks
Dave
BEGIN SET NOCOUNT ON;
DECLARE @Staging_Table TABLE
(
ACCTID INT IDENTITY(1,1),
NAME VARCHAR(50),
PURCHASES VARCHAR(255)
)
INSERT INTO @Staging_Table (Name, Purchases)
VALUES ('John','Vanilla|Chocolate|Peach')
INSERT INTO @Staging_Table (Name, Purchases)
VALUES ('Jack','Chocolate|Vanilla')
INSERT INTO @Staging_Table (Name, Purchases)
VALUES ('Mary','Peach|Vanilla|Bean')
INSERT INTO @Staging_Table (Name, Purchases)
VALUES ('Peter','Vanilla|Peach')
INSERT INTO @Staging_Table (Name, Purchases)
VALUES ('Jane','Bean|Vanilla|Chocolate|Peach')
-- Get the number of rows in the looping table
DECLARE @RowCount INT
SET @RowCount = (SELECT COUNT(ACCTID) FROM @Staging_Table)
-- Declare an iterator
DECLARE @I INT
-- Initialize the iterator
SET @I = 1
-- Loop through the rows of a table @myTable
WHILE (@I <= @RowCount)
BEGIN
-- Declare variables to hold the data which we get after looping each record
DECLARE @NAME VARCHAR(255), @PURCHASES VARCHAR(255)
-- Get the data from table and set to variables
SELECT
@NAME = NAME,
@PURCHASES = PURCHASES
FROM
@Staging_Table
WHERE
ACCTID = @I
-- Display the looped data
SELECT
@I,
@NAME,
t.txt_value
FROM dbo.fn_ParseText2Table(@PURCHASES, '|') as t
SET @I = @I + 1
END
END
Upvotes: 1
Views: 1766
Reputation: 5782
DECLARE @Tally TABLE (N INT)
DECLARE @i AS INT = 1
WHILE @i != 1000
BEGIN
INSERT INTO @Tally (N) VALUES (@i)
SET @i = @i + 1
END
--------------------------------------------------------
DECLARE @Staging_Table TABLE
(ACCTID INT IDENTITY(1, 1)
,NAME VARCHAR(50)
,PURCHASES VARCHAR(255))
INSERT INTO @Staging_Table (NAME,Purchases)
VALUES ('John','Vanilla|Chocolate|Peach')
,('Jack','Chocolate|Vanilla')
,('Mary','Peach|Vanilla|Bean')
,('Peter','Vanilla|Peach')
,('Jane','Bean|Vanilla|Chocolate|Peach')
,('Jane','Bean Vanilla Chocolate Peach')
----------------------------------------------------------
--1 variant:
SELECT E.NAME,f3.Purch
FROM @Staging_Table AS E
INNER JOIN @Tally AS T ON SUBSTRING('|' + E.Purchases, T.N, 1) = '|' AND T.N < LEN(E.Purchases) + 1
CROSS APPLY (SELECT string = SUBSTRING(' ' + E.Purchases + '|', T.N + 1, LEN(E.Purchases) + 1)) f1
CROSS APPLY (SELECT p1 = CHARINDEX('|', string)) f2
CROSS APPLY (SELECT Purch = SUBSTRING(E.Purchases, T.N, p1-1)) f3
ORDER BY E.NAME
----------------------------------------------------------
--2 variant:
SELECT E.NAME ,
( CASE WHEN CHARINDEX('|', S.string) > 0
THEN LEFT(S.string, CHARINDEX('|', S.string) - 1)
ELSE string
END ) AS Purch
FROM @Staging_Table AS E
INNER JOIN @Tally AS T ON SUBSTRING('|' + PURCHASES, T.N, 1) = '|'
AND T.N <= LEN(PURCHASES)
CROSS APPLY ( SELECT String = ( CASE WHEN T.N = 1
THEN ( CASE WHEN CHARINDEX('|',
E.PURCHASES) > 0
THEN LEFT(E.PURCHASES,
CHARINDEX('|',
E.PURCHASES) - 1)
ELSE E.PURCHASES
END )
ELSE SUBSTRING(E.PURCHASES,
T.N, 1000)
END )
) S
ORDER BY E.NAME
Upvotes: 1
Reputation: 7722
SQL Server is slow with string manipulation, so any SQL implementation will be horrible, performance-wise. If you are truly into the speed, you will have to create a CLR function - being properly written, it will beat any SQL on this particular task, hands down.
Another option, which might or might not be a viable solution in your case, is SSIS package.
Meanwhile, you can try to use XML for row splitting - it might be even worse than loops, though (not to mention many other deficiencies):
select t.ACCTID, t.NAME, p.c.value('./@v', 'varchar(50)') as [PurchaseItem]
from @Staging_Table t
cross apply (
select cast('<r><i v="'
+ replace(t.PURCHASES, '|', '"/><i v="')
+ '" /></r>' as xml
) as [XData]
) ca
cross apply ca.XData.nodes('/r[1]/i') p(c);
Upvotes: 0
Reputation: 3704
It is bad practice to store data like this, but I will tell you a solution I have found that works. Ideally, you would do this outside of the SQL server. This works fast enough for me.
CREATE TABLE #tempValues (
name VARCHAR(100),
val VARCHAR(100)
);
Create a temporary table to hold the data. The solution is a recursive function takes the left hand side, and then recalls the functions on the right hand side.
CREATE FUNCTION breakStringUp
@nameTemp VARCHAR(100),
@stringToSplit VARCHAR(100)
BEGIN
IF FIND(@stringToSplit, '|') > 0
BEGIN
INSERT INTO #tempValues
( name, val)
VALUES
(@nameTemp, LEFT(@stringToSplit,
FIND(@stringToSplit, '|'));
breakStringUp(@nameTemp,
RIGHT( /* Recurse into the function, The index is
taken from the right side so invert it
LEN(@stringToSpit)-FIND(...) */)
)
END
ELSE
BEGIN
INSERT INTO #tempValues ..............
END
The above code is a little complicated. I just wrote the pseudocode, but it should work depending on the schema.
DECLARE CURSOR FOR ........
Declare a cursor over the row and iterate. Call the function on each name and corresponding VARCHAR.
SELECT * FROM #tempValues;
And clean up the temporary table
DROP #tempValues;
Upvotes: 0