Reputation: 69
I am trying to identify/extract multiple ID's from a string; with the aim of then using the identified values to create a comma-separated list.
The string primarily contains "useless" text. However, some rows will contain either one or more 'IDs' that are useful and I would like to extract said IDs.
Every ID will have the same three-letter prefix ("EGG") followed by a five-digit numeric value (e.g. 12911).
For example, I have the following strings in a description field in my sample data (TEXT_SAMPLE
):
TEXT_SAMPLE | |
---|---|
1 | Today is a EGG01234 EGG12345 fine day EGG67891 |
2 | Blah Blah Blah |
3 | EGG01234, EGG12345 test test |
I would like the OUTPUT
to be:
TEXT_SAMPLE | OUTPUT | |
---|---|---|
1 | Today is a EGG01234 EGG12345 fine day EGG67891 | EGG01234,EGG12345,EGG67891 |
2 | Blah Blah Blah | null |
3 | EGG01234, EGG12345 test test | EGG01234,EGG12345 |
I appreciate it would be easier to do this via REGEX; but unfortunately, I am limited to SQL Server.
I have attempted the following but it's not quite right:
SELECT
T1.TEXT_EXAMPLE,
CASE
WHEN T1.TEXT_EXAMPLE LIKE CONCAT ('%', 'EGG', '%')
THEN SUBSTRING(T1.TEXT_EXAMPLE, PATINDEX('%EGG%', T1.TEXT_EXAMPLE), LEN(T1.TEXT_EXAMPLE))
ELSE NULL
END AS OUTPUT
FROM
MSSQL_SAMPLE_DATA T1
Output:
TEXT_SAMPLE | OUTPUT | |
---|---|---|
1 | Today is a EGG01234 EGG12345 fine day EGG67891 | EGG01234 EGG12345 fine day EGG67891 |
2 | Blah Blah Blah | null |
3 | EGG01234, EGG12345 test test | EGG01234, EGG12345 test test |
Please feel free to use the script below to CREATE
a Table that contains some sample data:
CREATE TABLE #MSSQL_SAMPLE_DATA_TABLE
(
TEXT_EXAMPLE nvarchar NULL
)
INSERT INTO #MSSQL_SAMPLE_DATA_TABLE ( [TEXT_EXAMPLE])
VALUES ('Today is a EGG01234 EGG12345 fine day EGG67891'),
('Blah Blah Blah'),
('EGG01234, EGG12345 test test')
And when you've finished, the following will DROP
the Table:
IF OBJECT_ID('tempdb..#MSSQL_SAMPLE_DATA_TABLE') IS NOT NULL
DROP TABLE #MSSQL_SAMPLE_DATA_TABLE
Or, if you would rather use CTE:
WITH MSSQL_SAMPLE_DATA (TEXT_EXAMPLE) AS
(
SELECT
'Today is a EGG01234 EGG12345 fine day EGG67891'
UNION
SELECT
'Blah Blah Blah'
UNION
SELECT
'EGG01234, EGG12345 test test'
)
SELECT
T1.*
FROM
MSSQL_SAMPLE_DATA T1
Please let me know if you require additional information, etc.
Upvotes: 0
Views: 606
Reputation: 6638
you can use following query
SELECT T1.TEXT_EXAMPLE,REPLACE(T2.[OUTPUT],',,',',')
FROM MSSQL_SAMPLE_DATA_TABLE T1 LEFT JOIN
(SELECT TEXT_EXAMPLE,STRING_AGG(value,',') AS [OUTPUT]
FROM MSSQL_SAMPLE_DATA_TABLE
CROSS APPLY STRING_SPLIT(TEXT_EXAMPLE, ' ')
WHERE value LIKE CONCAT('%', 'EGG', '%')
GROUP BY TEXT_EXAMPLE) T2 ON t1.TEXT_EXAMPLE = T2.TEXT_EXAMPLE
result in https://dbfiddle.uk
Upvotes: 0
Reputation: 1269623
You can use string_split()
and string_agg()
to accomplish this:
select *
from MSSQL_SAMPLE_DATA_TABLE cross apply
(select string_agg(s.value, ',') as eggs
from string_split(translate(text_example, ',', ' '), ' ') s
where s.value like 'EGG[0-9][0-9][0-9][0-9][0-9]'
) s;
Here is a db<>fiddle.
Upvotes: 2