Stagg
Stagg

Reputation: 2855

return value at a position from STRING_SPLIT in SQL Server 2016

Can I return a value at a particular position with the STRING_SPLIT function in SQL Server 2016 or higher?

I know the order from a select is not guaranteed, but is it with STRING_SPLIT?

DROP TABLE IF EXISTS #split

SELECT 'z_y_x' AS splitIt
INTO #split UNION
SELECT 'a_b_c'

SELECT * FROM #split;

WITH cte
AS (
SELECT      ROW_NUMBER() OVER ( PARTITION BY s.splitIt ORDER BY s.splitIt ) AS position,
            s.splitIt,
            value
FROM        #split s
CROSS APPLY STRING_SPLIT(s.splitIt, '_')
)
SELECT * FROM cte WHERE position = 2

Will this always return the value at the 2nd element? b for a_b_c and y for z_y_x?

I don't understand why Microsoft doesn't return a position indicator column alongside the value for this function.

Upvotes: 8

Views: 11283

Answers (7)

Lukasz Szozda
Lukasz Szozda

Reputation: 175924

Using STRING_SPLIT:

STRING_SPLIT ( string , separator [ , enable_ordinal ] )

enable_ordinal

An int or bit expression that serves as a flag to enable or disable the ordinal output column. A value of 1 enables the ordinal column. If enable_ordinal is omitted, NULL, or has a value of 0, the ordinal column is disabled.

The enable_ordinal argument and ordinal output column are currently only supported in Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics (serverless SQL pool only).


Query:

SELECT value FROM STRING_SPLIT('part1_part2_part3', '_', 1) WHERE ordinal = 2;

Upvotes: 2

BitLauncher
BitLauncher

Reputation: 693

I just extended @Shnugo's answer if the splitted text would contain line breaks, unicode and other non json compatible characters, to use STRING_ESCAPE

My Test code with pipe as separator instead comma:

DECLARE @Separator VARCHAR(5) = STRING_ESCAPE('|', 'json'); -- here pipe or use any other separator (even ones escaped by json)
DECLARE @LongText VARCHAR(MAX) = 'Albert says: "baby, listen!"|ve Çağrı söylüyor: "Elma"|1st Line' + CHAR(13) + CHAR(10) + '2nd line';

SELECT * FROM OPENJSON('["' +  REPLACE(STRING_ESCAPE(@LongText, 'json'), @Separator ,'","') + '"]'); -- ok
-- SELECT * FROM OPENJSON('["' +  REPLACE(@LongText, @Separator ,'","') + '"]'); -- fails with: JSON text is not properly formatted. ...

Updated due to comment from Simon Zeinstra

Upvotes: 2

alejandrodotor8
alejandrodotor8

Reputation: 672

This works

Example:

String = "pos1-pos2-pos3"

REVERSE(PARSENAME(REPLACE(REVERSE(String), '-', '.'), 1))

With 1 Returns "pos1"

With 2 will return "pos2"...

Upvotes: 0

HispanoSevillano
HispanoSevillano

Reputation: 1

Here is my workaround. I will follow the Question waiting for a better answer:

UPDATED: Original code did not take into consideration if a word contains another.

UPDATE 2: Performance was horrible in production so i have to think another way. you have it at the end as option 2, implementation for table.

UPDATE 3: Added code for UDF in the implementation in a string.

Implementation in a string:

declare @a as nvarchar(100) = 'Lorem ipsum dolor dol ol sit amet. D  Lorem DO ipsum  DOL dolor sit amet. DOLORES ipsum';

WITH T AS (
    SELECT T1.value
            ,charindex(' ' + T1.value + ' ',' ' + @a + ' ' ,0) AS INDX
            ,RN = ROW_NUMBER() OVER (PARTITION BY value order BY value) 
        FROM STRING_SPLIT(@a, ' ') AS T1
        WHERE T1.value <> ''
),
R (VALUE,INDX,RN) AS (
    SELECT * 
    FROM T
    WHERE T.RN = 1
UNION ALL
    SELECT T.VALUE
        ,charindex(' ' + T.value + ' ',' ' + @a + ' ',R.INDX + 1) AS INDX
        ,T.RN
    FROM T
    JOIN R
        ON T.value = R.VALUE
            AND T.RN = R.RN + 1
)
SELECT * FROM R ORDER BY INDX

result: tableOfResults

UDF:

CREATE FUNCTION DBO.UDF_get_word(@string nvarchar(100),@wordNumber int)
returns nvarchar(100)
AS
BEGIN   

DECLARE @searchedWord nvarchar(100);

WITH T AS (
    SELECT T1.value
            ,charindex(' ' + T1.value + ' ',' ' + @string + ' ' ,0) AS INDX
            ,RN = ROW_NUMBER() OVER (PARTITION BY value order BY value) 
        FROM STRING_SPLIT(@string, ' ') AS T1
        WHERE T1.value <> ''
),
R (VALUE,INDX,RN) AS (
    SELECT * 
    FROM T
    WHERE T.RN = 1
UNION ALL
    SELECT T.VALUE
        ,charindex(' ' + T.value + ' ',' ' + @string + ' ',R.INDX + 1) AS INDX
        ,T.RN
    FROM T
    JOIN R
        ON T.value = R.VALUE
            AND T.RN = R.RN + 1
)
SELECT @searchedWord = (value) FROM ( SELECT *, ORD = ROW_NUMBER() OVER (ORDER BY INDX) FROM R )AS TBL WHERE ORD = @wordNumber

RETURN @searchedword
END
GO

Modification for a column in a table, OPTION 1:

WITH T AS (
    SELECT T1.stringToBeSplit
            ,T1.column1 --column1 is an example of column where stringToBeSplit is the same for more than one record. better to be avoid but if you need to added here it is how just follow column1 over the code
            ,T1.column2
            ,T1.value
            ,T1.column3
            /*,...any other column*/
            ,charindex(' ' + T1.value + ' ',' ' + T1.stringToBeSplit + ' ' ,0) AS INDX
            ,RN = ROW_NUMBER() OVER (PARTITION BY t1.column1, T1.stringToBeSplit, T1.value order BY T1.column1, T1.T1.stringToBeSplit, T1.value) --any column that create duplicates need to be added here as example i added column1
        FROM (SELECT TOP 10 * FROM YourTable D CROSS APPLY string_split(D.stringToBeSplit,' ')) AS T1
        WHERE T1.value <> ''
),
R (stringToBeSplit, column1, column2, value, column3, INDX, RN) AS (
    SELECT stringToBeSplit, column1, column2, value, column3, INDX, RN
    FROM T
    WHERE T.RN = 1
UNION ALL
    SELECT T.stringToBeSplit, T.column1, column2, T.value, T.column3
        ,charindex(' ' + T.value + ' ',' ' + T.stringToBeSplit + ' ',R.INDX + 1) AS INDX
        ,T.RN
    FROM T
    JOIN R
        ON T.value = R.VALUE AND T.COLUMN1 = R.COLUMN1 --any column that create duplicates need to be added here as exapmle i added column1
            AND T.RN = R.RN + 1
)
SELECT * FROM R ORDER BY column1, stringToBeSplit, INDX

Modification for a column in a table, OPTION 2 (max performance i could get, main action came from removing the join and finding a way of properly execute (and stop) the recursive loop of the CTE, from 1.30 for 1000 lines to 2 sec for 30K lines of strings of similar type and length):

WITH T AS (
    SELECT  T1.stringToBeSplit --no extracolumns this time
            ,T1.value
            ,charindex(' ' + T1.value + ' ',' ' + T1.stringToBeSplit + ' ' ,0) AS INDX
            ,RN = ROW_NUMBER() OVER (PARTITION BY T1.stringToBeSplit,T1.value order BY T1.stringToBeSplit,T1.value) --from clause use distinct and where if possible
        FROM (SELECT DISTINCT stringToBeSplit, VALUE FROM [your table] D CROSS APPLY string_split(D.stringToBeSplit,' ') WHERE [your filter]) AS T1
        WHERE T1.value <> ''
),
R (stringToBeSplit, value, INDX, RN) AS (
    SELECT stringToBeSplit, value, INDX, RN
    FROM T
    WHERE T.RN = 1
UNION ALL
    SELECT R.stringToBeSplit, R.value
        ,charindex(' ' + R.value + ' ',' ' + R.stringToBeSplit + ' ',R.INDX + 1) AS INDX
        ,R.RN + 1
    FROM R
    WHERE charindex(' ' + R.value + ' ',' ' + R.stringToBeSplit + ' ',R.INDX + 1) <> 0
)
SELECT * FROM R ORDER BY stringToBeSplit, INDX

For getting the word ordinal instead of SELECT * FROM R USE:

SELECT stringToBeSplit ,value , ROW_NUMBER() OVER (PARTITION BY stringToBeSplit order BY [indX]) AS ORD FROM R 

if instead of having one RW per word you prefer one column:

select * FROM (SELECT [name 1],value , ROW_NUMBER() OVER (PARTITION BY [name 1] order BY [indX]) AS ORD FROM R ) as R2 

pivot (MAX(VALUE) FOR ORD in ([1],[2],[3]) ) AS PIV

if you don't want to specify the number of columns QUOTNAME() like in this link, in my case i only need first 4 words rest are irrelevant for the moment. Below the code from the page in case link fail:

DECLARE 
    @columns NVARCHAR(MAX) = '', 
    @sql     NVARCHAR(MAX) = '';

-- select the category names
SELECT 
    @columns+=QUOTENAME(category_name) + ','
FROM 
    production.categories
ORDER BY 
    category_name;

-- remove the last comma
SET @columns = LEFT(@columns, LEN(@columns) - 1);

-- construct dynamic SQL
SET @sql ='
SELECT * FROM   
(
    SELECT 
        category_name, 
        model_year,
        product_id 
    FROM 
        production.products p
        INNER JOIN production.categories c 
            ON c.category_id = p.category_id
) t 
PIVOT(
    COUNT(product_id) 
    FOR category_name IN ('+ @columns +')
) AS pivot_table;';

-- execute the dynamic SQL
EXECUTE sp_executesql @sql;

Last but not least i'm really looking forward to know if there is an easier way with same performance either in SQL server or in C#. i just think everything that does not use external info should stay in the Server and run as query or batch but not sure to be honest as i heard the contrary (specially from people that use panda) but no one have convince me just yet.

Upvotes: 0

Aliaksandr Panko
Aliaksandr Panko

Reputation: 43

I didn't want to deal with OPENJSON, but still wanted to get string_split() value by index. The performance was not an issue in my case.

I used CTE (Common Table Expression)

Assume you have string str = "part1 part2 part3".

WITH split_res_list as
( 
    SELECT value FROM STRING_SPLIT('part1 part2 part3', ' ')
),
split_res_list_with_index as 
(
    SELECT [value],
           ROW_NUMBER() OVER (ORDER BY [value] ASC) as [RowNumber]
    FROM split_res_list
)
SELECT * FROM split_res_list_with_index WHERE RowNumber = 2

BUT: please be aware that the order of 3 parts is changed according to ORDER BY condition!

The output for the second row with "part2" value:

Upvotes: 4

Gottfried Lesigang
Gottfried Lesigang

Reputation: 67311

There is - starting with v2016 - a solution via FROM OPENJSON():

DECLARE @str VARCHAR(100) = 'val1,val2,val3';

SELECT *
FROM OPENJSON('["' +  REPLACE(@str,',','","') + '"]');

The result

key value   type
0   val1    1
1   val2    1
2   val3    1

The documentation tells clearly:

When OPENJSON parses a JSON array, the function returns the indexes of the elements in the JSON text as keys.

For your case this was:

SELECT 'z_y_x' AS splitIt
INTO #split UNION
SELECT 'a_b_c'

DECLARE @delimiter CHAR(1)='_';

SELECT * 
FROM #split
CROSS APPLY OPENJSON('["' +  REPLACE(splitIt,@delimiter,'","') + '"]') s
WHERE s.[key]=1; --zero based

Let's hope, that future versions of STRING_SPLIT() will include this information

UPDATE Performance tests, compare with popular Jeff-Moden-splitter

Try this out:

USE master;
GO

CREATE DATABASE dbTest;
GO

USE dbTest;
GO
--Jeff Moden's splitter
CREATE FUNCTION [dbo].[DelimitedSplit8K](@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
  WITH E1(N) AS (
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
 cteTally(N) AS (
                 SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                ),
cteStart(N1) AS (
                 SELECT 1 UNION ALL
                 SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
                ),
cteLen(N1,L1) AS(
                 SELECT s.N1,
                        ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
                   FROM cteStart s
                )
 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
        Item       = SUBSTRING(@pString, l.N1, l.L1)
   FROM cteLen l
;
GO
--Avoid first call bias
SELECT * FROM dbo.DelimitedSplit8K('a,b,c',',');
GO  

--Table to keep the results
CREATE TABLE Results(ID INT IDENTITY,ResultSource VARCHAR(100),durationMS INT, RowsCount INT);
GO
--Table with strings to split
CREATE TABLE dbo.DelimitedItems(ID INT IDENTITY,DelimitedNString nvarchar(4000),DelimitedString varchar(8000));
GO

--Get rows wiht randomly mixed strings of 100 items
--Try to play with the count of rows (count behind GO) and the count with TOP

INSERT INTO DelimitedItems(DelimitedNString)
SELECT STUFF((
            SELECT TOP 100 ','+REPLACE(v.[name],',',';') 
            FROM master..spt_values v
            WHERE LEN(v.[name])>0
            ORDER BY NewID()
            FOR XML PATH('')),1,1,'')
--Keep it twice in varchar and nvarchar
UPDATE DelimitedItems SET DelimitedString=DelimitedNString;
GO 500 --create 500 differently mixed rows

--The tests

DECLARE @d DATETIME2;

SET @d = SYSUTCDATETIME();
    SELECT DI.ID, DS.Item, DS.ItemNumber
    INTO #TEMP
    FROM dbo.DelimitedItems DI
         CROSS APPLY dbo.DelimitedSplit8K(DI.DelimitedNString,',') DS;
INSERT INTO Results(ResultSource,RowsCount,durationMS)
SELECT 'delimited8K with NVARCHAR(4000)'
      ,(SELECT COUNT(*) FROM #TEMP) AS RowCountInTemp
      ,DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME()) AS Duration_NV_ms_delimitedSplit8K

SET @d = SYSUTCDATETIME();
    SELECT DI.ID, DS.Item, DS.ItemNumber
    INTO #TEMP2
    FROM dbo.DelimitedItems DI
         CROSS APPLY dbo.DelimitedSplit8K(DI.DelimitedString,',') DS;
INSERT INTO Results(ResultSource,RowsCount,durationMS)
SELECT 'delimited8K with VARCHAR(8000)'
      ,(SELECT COUNT(*) FROM #TEMP2) AS RowCountInTemp
      ,DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME()) AS Duration_V_ms_delimitedSplit8K

SET @d = SYSUTCDATETIME();
    SELECT DI.ID, OJ.[Value] AS Item, OJ.[Key] AS ItemNumber
    INTO #TEMP3
    FROM dbo.DelimitedItems DI
         CROSS APPLY OPENJSON('["' +  REPLACE(DI.DelimitedNString,',','","') + '"]') OJ;
INSERT INTO Results(ResultSource,RowsCount,durationMS)
SELECT 'OPENJSON with NVARCHAR(4000)'
      ,(SELECT COUNT(*) FROM #TEMP3) AS RowCountInTemp
      ,DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME()) AS Duration_NV_ms_OPENJSON

SET @d = SYSUTCDATETIME();
    SELECT DI.ID, OJ.[Value] AS Item, OJ.[Key] AS ItemNumber
    INTO #TEMP4
    FROM dbo.DelimitedItems DI
         CROSS APPLY OPENJSON('["' +  REPLACE(DI.DelimitedString,',','","') + '"]') OJ;
INSERT INTO Results(ResultSource,RowsCount,durationMS)
SELECT 'OPENJSON with VARCHAR(8000)'
      ,(SELECT COUNT(*) FROM #TEMP4) AS RowCountInTemp
      ,DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME()) AS Duration_V_ms_OPENJSON
GO
SELECT * FROM Results;
GO

--Clean up

DROP TABLE #TEMP;
DROP TABLE #TEMP2;
DROP TABLE #TEMP3;
DROP TABLE #TEMP4;

USE master;
GO
DROP DATABASE dbTest;

Results:

200 items in 500 rows

1220    delimited8K with NVARCHAR(4000)
 274    delimited8K with VARCHAR(8000)
 417    OPENJSON with NVARCHAR(4000)
 443    OPENJSON with VARCHAR(8000)

100 items in 500 rows

421 delimited8K with NVARCHAR(4000)
140 delimited8K with VARCHAR(8000)
213 OPENJSON with NVARCHAR(4000)
212 OPENJSON with VARCHAR(8000)

100 items in 5 rows

10  delimited8K with NVARCHAR(4000)
5   delimited8K with VARCHAR(8000)
3   OPENJSON with NVARCHAR(4000)
4   OPENJSON with VARCHAR(8000)

5 items in 500 rows

32  delimited8K with NVARCHAR(4000)
30  delimited8K with VARCHAR(8000)
28  OPENJSON with NVARCHAR(4000)
24  OPENJSON with VARCHAR(8000)

--unlimited length (only possible with OPENJSON) --Wihtout a TOP clause while filling
--results in about 500 items in 500 rows

1329    OPENJSON with NVARCHAR(4000)
1117    OPENJSON with VARCHAR(8000)

Facit:

  • the popular splitter function does not like NVARCHAR
  • the function is limited to strings within 8k byte volumen
  • Only the case with many items and many rows in VARCHAR lets the splitter function be ahead.
  • In all other cases OPENJSON seems to be more or less faster...
  • OPENJSON can deal with (almost) unlimited counts
  • OPENJSON demands for v2016
  • Everybody is waiting for STRING_SPLIT with the position

UPDATE Added STRING_SPLIT to the test

In the meanwhile I re-run the test with two more test sections using STRING_SPLIT(). As position I had to return a hardcoded value as this function does not return the part's index.

In all tested cases OPENJSON was close with STRING_SPLIT and often faster:

5 items in 1000 rows

250 delimited8K with NVARCHAR(4000)
124 delimited8K with VARCHAR(8000) --this function is best with many rows in VARCHAR
203 OPENJSON with NVARCHAR(4000)
204 OPENJSON with VARCHAR(8000)
235 STRING_SPLIT with NVARCHAR(4000)
234 STRING_SPLIT with VARCHAR(8000)

200 items in 30 rows

140 delimited8K with NVARCHAR(4000)
31  delimited8K with VARCHAR(8000)
47  OPENJSON with NVARCHAR(4000)
31  OPENJSON with VARCHAR(8000)
47  STRING_SPLIT with NVARCHAR(4000)
31  STRING_SPLIT with VARCHAR(8000)

100 items in 10.000 rows

8145    delimited8K with NVARCHAR(4000)
2806    delimited8K with VARCHAR(8000) --fast with many rows!
5112    OPENJSON with NVARCHAR(4000)
4501    OPENJSON with VARCHAR(8000)
5028    STRING_SPLIT with NVARCHAR(4000)
5126    STRING_SPLIT with VARCHAR(8000)

Upvotes: 12

Thom A
Thom A

Reputation: 95830

The simple answer is, no. Microsoft so far have refused to provide Ordinal position as part of the return dataset in STRING_SPLIT. You'll need to use a different solution I'm afraid. For example Jeff Moden's DelimitedSplit8k.

(Yes, I realise this is more or less a link only answer, however, pasting Jeff's solution here would effectively be plagiarism).

If you were to use Jeff's solution, then you would be able to do something like:

SELECT *
FROM dbo.DelimitedSplit8K('a,b,c,d,e,f,g,h,i,j,k',',') DS
WHERE ItemNumber = 2;

Of course, you'd likely be passing column rather than a literal string.

Upvotes: 4

Related Questions