MAndrews
MAndrews

Reputation: 69

SQL Server: identify / extract multiple values (e.g. IDs) from a STRING to create a comma-separated list

I am trying to identify/extract multiple ID's from a string; with the aim of then using the identified values to create a comma-separated list.

The string primarily contains "useless" text. However, some rows will contain either one or more 'IDs' that are useful and I would like to extract said IDs.

Every ID will have the same three-letter prefix ("EGG") followed by a five-digit numeric value (e.g. 12911).

For example, I have the following strings in a description field in my sample data (TEXT_SAMPLE):

TEXT_SAMPLE
1 Today is a EGG01234 EGG12345 fine day EGG67891
2 Blah Blah Blah
3 EGG01234, EGG12345 test test

I would like the OUTPUT to be:

TEXT_SAMPLE OUTPUT
1 Today is a EGG01234 EGG12345 fine day EGG67891 EGG01234,EGG12345,EGG67891
2 Blah Blah Blah null
3 EGG01234, EGG12345 test test EGG01234,EGG12345

I appreciate it would be easier to do this via REGEX; but unfortunately, I am limited to SQL Server.

I have attempted the following but it's not quite right:

SELECT
    T1.TEXT_EXAMPLE,
    CASE
       WHEN T1.TEXT_EXAMPLE LIKE CONCAT ('%', 'EGG', '%')
          THEN SUBSTRING(T1.TEXT_EXAMPLE, PATINDEX('%EGG%', T1.TEXT_EXAMPLE), LEN(T1.TEXT_EXAMPLE))
          ELSE NULL
    END AS OUTPUT
FROM
    MSSQL_SAMPLE_DATA T1

Output:

TEXT_SAMPLE OUTPUT
1 Today is a EGG01234 EGG12345 fine day EGG67891 EGG01234 EGG12345 fine day EGG67891
2 Blah Blah Blah null
3 EGG01234, EGG12345 test test EGG01234, EGG12345 test test

Please feel free to use the script below to CREATE a Table that contains some sample data:

CREATE TABLE #MSSQL_SAMPLE_DATA_TABLE 
(
    TEXT_EXAMPLE nvarchar NULL
)

INSERT INTO #MSSQL_SAMPLE_DATA_TABLE ( [TEXT_EXAMPLE])
VALUES ('Today is a EGG01234 EGG12345 fine day EGG67891'),  
       ('Blah Blah Blah'),  
       ('EGG01234, EGG12345 test test')

And when you've finished, the following will DROP the Table:

IF OBJECT_ID('tempdb..#MSSQL_SAMPLE_DATA_TABLE') IS NOT NULL 
   DROP TABLE #MSSQL_SAMPLE_DATA_TABLE

Or, if you would rather use CTE:

WITH MSSQL_SAMPLE_DATA (TEXT_EXAMPLE) AS
(
       SELECT
              'Today is a EGG01234 EGG12345 fine day EGG67891'
       UNION
       SELECT
              'Blah Blah Blah'
       UNION
       SELECT
              'EGG01234, EGG12345 test test'
)
SELECT
       T1.*
FROM
       MSSQL_SAMPLE_DATA T1

Please let me know if you require additional information, etc.

Upvotes: 0

Views: 606

Answers (2)

persian-theme
persian-theme

Reputation: 6638

you can use following query

SELECT T1.TEXT_EXAMPLE,REPLACE(T2.[OUTPUT],',,',',') 
FROM  MSSQL_SAMPLE_DATA_TABLE T1 LEFT JOIN
(SELECT TEXT_EXAMPLE,STRING_AGG(value,',') AS [OUTPUT]
  FROM MSSQL_SAMPLE_DATA_TABLE  
    CROSS APPLY STRING_SPLIT(TEXT_EXAMPLE, ' ')
    WHERE value LIKE CONCAT('%', 'EGG', '%')
    GROUP BY TEXT_EXAMPLE) T2 ON t1.TEXT_EXAMPLE = T2.TEXT_EXAMPLE

result in https://dbfiddle.uk

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269623

You can use string_split() and string_agg() to accomplish this:

select *
from MSSQL_SAMPLE_DATA_TABLE cross apply
    (select string_agg(s.value, ',') as eggs
     from string_split(translate(text_example, ',', ' '), ' ') s
     where s.value like 'EGG[0-9][0-9][0-9][0-9][0-9]'
    ) s;

Here is a db<>fiddle.

Upvotes: 2

Related Questions