Reputation: 57
I have a table with two columns of comma-separated strings. The way the data is formatted, the number of comma-separated items in both columns is equal, and the first value in colA is related to the first value in colB, and so on. (It's obviously not a very good data format, but it's what I'm working with.)
If I have the following row (PrimaryKeyID | column1 | column2):
1 | a,b,c | A,B,C
then in this data format, a & 1 are logically related, b & 2, etc.
I want to use STRING_SPLIT
to split these columns, but using it twice obviously crosses them with each other, resulting in a total of 9 rows.
1 | a | A
1 | b | A
1 | c | A
1 | a | B
1 | b | B
1 | c | B
1 | a | C
1 | b | C
1 | c | C
What I want is just the 3 "logically-related" columns
1 | a | A
1 | b | B
1 | c | C
However, STRING_SPLIT(myCol,',')
doesn't appear to save the String Position anywhere.
I have done the following:
SELECT tbl.ID,
t1.Column1Value,
t2.Column2Value
FROM myTable tbl
INNER JOIN (
SELECT t.ID,
ss.value AS Column1Value,
ROW_NUMBER() OVER (PARTITION BY t.ID ORDER BY t.ID) as StringOrder
FROM myTable t
CROSS APPLY STRING_SPLIT(t.column1,',') ss
) t1 ON tbl.ID = t1.ID
INNER JOIN (
SELECT t.ID,
ss.value AS Column2Value,
ROW_NUMBER() OVER (PARTITION BY t.ID ORDER BY t.ID) as StringOrder
FROM myTable t
CROSS APPLY STRING_SPLIT(t.column2,',') ss
) t1 ON tbl.ID = t2.ID AND t1.StringOrder = t2.StringOrder
This appears to work on my small test set, but in my opinion there is no reason to expect it to work guaranteed every time. The ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID)
is obviously a meaningless ordering, but it appears that, in absence of any real ordering, STRING_SPLIT is returning the values in the "default" order that they were already in. Is this "expected" behaviour? Can I count on this? Is there any other way of accomplishing what I'm attempting to do?
I got what I wanted (I think) with the following UDF. However it's pretty slow. Any suggestions?
CREATE FUNCTION fn.f_StringSplit(@string VARCHAR(MAX),@delimiter VARCHAR(1))
RETURNS @r TABLE
(
Position INT,
String VARCHAR(255)
)
AS
BEGIN
DECLARE @current_position INT
SET @current_position = 1
WHILE CHARINDEX(@delimiter,@string) > 0 BEGIN
INSERT INTO @r (Position,String) VALUES (@current_position, SUBSTRING(@string,1,CHARINDEX(@delimiter,@string) - 1))
SET @current_position = @current_position + 1
SET @string = SUBSTRING(@string,CHARINDEX(@delimiter,@string) + 1, LEN(@string) - CHARINDEX(@delimiter,@string))
END
--add the last one
INSERT INTO @r (Position, String) VALUES(@current_position,@string)
RETURN
END
Upvotes: 4
Views: 24281
Reputation: 1
Here is a t-sql function that uses string_split and adds the ordinal column:
drop function if exists [dbo].[varchar_split2];
go
create function [dbo].[varchar_split2]
(
@text varchar(max),
@delimiter char(1) = ','
)
returns @result table ([Ordinal] int not null identity(1, 1) primary key, [Value] varchar(128) not null)
as
begin
insert @result ([Value])
select
[Value]
from
string_split(@text, @delimiter)
where
0 != len([Value])
;
return;
end;
go
Upvotes: 0
Reputation: 2268
Fortunately in newer SQL Server (Azure and 2022) an optional flag has been added to String_Split
to include an "ordinal" column. If you are using a newer version of SQL Server, this finally provides a solution that is logically correct rather than implementation specific.
New definition:
String_Split(string, separator [, enable_ordinal])
e.g. String_Split('1.2.3', '.', 1)
Example:
with V as (select Value v, Ordinal n from String_Split('1.2.3', '.', 1))
select
(select v from V where n = 1) Major,
(select v from V where n = 2) Minor,
(select v from V where n = 3) Revision
Returns:
Major Minor Revision
----- ----- ---------
1 2 3
Upvotes: 6
Reputation: 2268
The only way I've discovered to expressively maintain the order of the String_Split()
function this is using the Row_Number()
function with a literal value in the "order by".
For example:
declare @Version nvarchar(128)
set @Version = '1.2.3';
with V as (select value v, Row_Number() over (order by (select 0)) n from String_Split(@Version, '.'))
select
(select v from V where n = 1) Major,
(select v from V where n = 2) Minor,
(select v from V where n = 3) Revision
Returns:
Major Minor Revision
----- ----- ---------
1 2 3
Update: if you are using a newer version of SQL Server, you can now provide an optional third bit argument which indicates that and ordinal column should also be included in the result. See my other answer here for more details.
Upvotes: 19
Reputation: 473
This is very simple
CREATE TABLE #a(
id [INT] IDENTITY(1,1) NOT NULL,
OrgId INT )
INSERT INTO #a
(
OrgId
)
SELECT value FROM STRING_SPLIT('18,44,45,46,47,48,49,50,51,52,53', ',')
Select * from #a
Upvotes: 0
Reputation: 11
Mark, here is a solution I would use. Assuming that [column 1]
in your table has the "key" values that are more less stable, and [column2]
has corresponding "field" values that can be sometimes omitted or NULL:
There will be two extractions, one for [column 1]
- which I assume is the Key, another for [column 2]
- which I assume is the sort of "values" for the "key", they will be auto parsed then by STRING_SPLIT
function.
These two INDEPENDENT result-sets will be then re-numbered based on the time of operation (which is always sequential). Take note, we renumber not by the field content or position of the comma etc, BUT by the timestamp.
Then they will get joined back together by LEFT OUTER JOIN
; note not by INNER JOIN
due to the fact that our "field values" could get omitted, while "keys" will always be there
Below is the TSQL code, as this is my first post to this site, hope it looks ok:
SELECT T1.ID, T1.KeyValue, T2.FieldValue
from (select t1.ID, row_number() OVER (PARTITION BY t1.ID ORDER BY current_timestamp) AS KeyRow, t2.value AS KeyValue
from myTable t1
CROSS APPLY STRING_SPLIT(t1.column1,',') as t2) T1
LEFT OUTER JOIN
(select t1.ID, row_number() OVER (PARTITION BY t1.ID ORDER BY current_timestamp) AS FieldRow, t3.value AS FieldValue
from myTable t1
CROSS APPLY STRING_SPLIT(t1.column2,',') as t3) T2 ON T1.ID = T2.ID AND T1.KeyRow = T2.FieldRow
Upvotes: 1
Reputation: 642
SELECT
PrimaryKeyID ,t2.items as column1, t1.items as column2 from [YourTableName]
cross Apply [dbo].[Split](column2) as t1
cross Apply [dbo].[Split](column1) as t2
Upvotes: 1
Reputation: 471
I'm a little late to this question, but I was just attempting the same thing with string_split since I've run into a performance problem of late. My experience with string splitters in T-SQL has led me to use recursive CTE's for most things containing fewer than 1,000 delimited values. Ideally, a CLR procedure would be used if you need ordinal in your string split.
That said, I've come to a similar conclusion as you on getting ordinal from string_split. You can see the queries and statistics below which, in order, are the bare string_split function, a CTE RowNumber of string_split, and then my personal string split CTE function I derived from this awesome write-up. The main difference between my CTE-based function and the one in the write-up is I made it an Inline-TVF instead of their implementation of a MultiStatement-TVF, which you can read about the differences here.
In my experiments I haven't seen a deviation using ROW_NUMBER on a constant returning the internal order of the delimited string, so I will be using it until such time as I find a problem with it, but if order is imperative in a business setting, I would probably recommend the Moden splitter featured in the first link above, which links to the author's article here since it is right in-line with the performance seen by the less safe string_split with RowNumber approach.
set nocount on;
declare
@iter int = 0,
@rowcount int,
@val varchar(max) = '';
while len(@val) < 1e6
select
@val += replicate(concat(@iter, ','), 8e3),
@iter += 1;
raiserror('Begin string_split Built-In', 0, 0) with nowait;
set statistics time, io on;
select
*
from
string_split(@val, ',')
where
[value] > '';
select
@rowcount = @@rowcount;
set statistics time, io off;
print '';
raiserror('End string_split Built-In | Return %d Rows', 0, 0, @rowcount) with nowait;
print '';
raiserror('Begin string_split Built-In with RowNumber', 0, 0) with nowait;
set statistics time, io on;
with cte
as (
select
*,
[group] = 1
from
string_split(@val, ',')
where
[value] > ''
),
cteCount
as (
select
*,
[id] = row_number() over (order by [group])
from
cte
)
select
*
from
cteCount;
select
@rowcount = @@rowcount;
set statistics time, io off;
print '';
raiserror('End string_split Built-In with RowNumber | Return %d Rows', 0, 0, @rowcount) with nowait;
print '';
raiserror('Begin Moden String Splitter', 0, 0) with nowait;
set statistics time, io on;
select
*
from
dbo.SplitStrings_Moden(@val, ',')
where
item > '';
select
@rowcount = @@rowcount;
set statistics time, io off;
print '';
raiserror('End Moden String Splitter | Return %d Rows', 0, 0, @rowcount) with nowait;
print '';
raiserror('Begin Recursive CTE String Splitter', 0, 0) with nowait;
set statistics time, io on;
select
*
from
dbo.fn_splitByDelim(@val, ',')
where
strValue > ''
option
(maxrecursion 0);
select
@rowcount = @@rowcount;
set statistics time, io off;
Statistics being
Begin string_split Built-In
SQL Server Execution Times:
CPU time = 2000 ms, elapsed time = 5325 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
End string_split Built-In | Return 331940 Rows
Begin string_split Built-In with RowNumber
SQL Server Execution Times:
CPU time = 2094 ms, elapsed time = 8119 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
End string_split Built-In with RowNumber | Return 331940 Rows
Begin Moden String Splitter
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 6 ms.
SQL Server Execution Times:
CPU time = 8734 ms, elapsed time = 9009 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
End Moden String Splitter | Return 331940 Rows
Begin Recursive CTE String Splitter
Table 'Worktable'. Scan count 2, logical reads 1991648, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 147188 ms, elapsed time = 147480 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
End Recursive CTE String Splitter | Return 331940 Rows
Upvotes: 1
Reputation: 1270391
Your idea is fine, but your order by
is not using a stable sort. I think it is safer to do:
SELECT tbl.ID, t1.Column1Value, t2.Column2Value
FROM myTable tbl INNER JOIN
(SELECT t.ID, ss.value AS Column1Value,
ROW_NUMBER() OVER (PARTITION BY t.ID
ORDER BY CHARINDEX(',' + ss.value + ',', ',' + t.column1 + ',')
) as StringOrder
FROM myTable t CROSS APPLY
STRING_SPLIT(t.column1,',') ss
) t1
ON tbl.ID = t1.ID INNER JOIN
(SELECT t.ID, ss.value AS Column2Value,
ROW_NUMBER() OVER (PARTITION BY t.ID
ORDER BY CHARINDEX(',' + ss.value + ',', ',' + t.column2 + ',')
) as StringOrder
FROM myTable t CROSS APPLY
STRING_SPLIT(t.column2, ',') ss
) t2
ON tbl.ID = t2.ID AND t1.StringOrder = t2.StringOrder;
Note: This may not work as desired if the strings have non-adjacent duplicates.
Upvotes: 2