Reputation: 1
I have the following string stored in a TEXT datatype which I want to extract the values for
Date:
Queue:
File Name:
and return them in their own columns.
STRING:
If you are able to, please correct the issue and resubmit the file.
Date: 10/8/2024
Queue: ENTRY
File Name: TEST_FILE.PDF
Columns:
Date Queue File Name
-------------------------------------------
10/8/2024 ENTRY TEST_FILE.PDF
I have come up with the following code but have been unable to exclude additional information that comes back.
I get the following data returned:
Date Queue File Name
--------------------------------------------------------------
10/8/2024 Queue: ENTRY File Na TEST_FILE.PDF
SELECT
SUBSTRING(CAST(em.body AS NVARCHAR(300)),
CHARINDEX('Date:', CAST(em.body AS NVARCHAR(300))) + 6,
(CHARINDEX('Queue:', CAST(em.body AS NVARCHAR(300))) - CHARINDEX('Date:', CAST(em.body AS NVARCHAR(300))))) 'Date',
SUBSTRING(CAST(em.body AS NVARCHAR(300)),
CHARINDEX('Queue:', CAST(em.body AS NVARCHAR(300))) + 7,
(CHARINDEX('File Name:', CAST(em.body AS NVARCHAR(300))) - CHARINDEX('Queue:', CAST(em.body AS NVARCHAR(300))))) 'Queue',
RIGHT(CAST(em.body AS NVARCHAR(300)), (LEN(CAST(em.body AS NVARCHAR(300))) - 10) - CHARINDEX('File Name:', CAST(em.body AS NVARCHAR(300)))) 'File Name'
FROM
email em WITH(NOLOCK)
I know I need to decrease the length value for the SUBSTRING
calls, but no matter where I put in a value to decrease them, I get the following error:
Msg 537, Level 16, State 3, Line 2
Invalid length parameter passed to the LEFT or SUBSTRING function
Upvotes: 0
Views: 61
Reputation: 81905
Just another option using a bit of JSON
Example
Declare @YourTable Table (id int,[SomeCol] varchar(550)) Insert Into @YourTable Values
(1,'Date: 10/8/2024 Queue: ENTRY File Name: TEST_FILE.PDF'),
(2,'Date: 11/9/2024 Queue: ENTRY File Name: OtherFileName.PDF')
Select ID
,B.*
From @YourTable A
Cross Apply (
Select Date = max(case when value='Date:' then LV end)
,Queue = max(case when value='Queue:' then LV end)
,FName = max(case when value='Name:' then LV end)
From (Select *
,LV = lead(value,1) over (order by [key])
From OpenJSON ('["'+replace(string_escape([SomeCol],'json'),' ','","')+'"]')
) B1
) B
Results
ID Date Queue FName
1 10/8/2024 ENTRY TEST_FILE.PDF
2 11/9/2024 ENTRY OtherFileName.PDF
Upvotes: 0
Reputation: 27177
If you find yourself with no choice but to do messy string extractions then the trick is to methodically build up your logic testing each bit at a time.
Personally I like to use a DRY approach, even though its not typical for SQL, because it reduces the chance of the mistakes which can occur when you repeat logic. This can be done with use of the CROSS APPLY
operator.
This is far from the most concise, but is easier (IMO) to build and maintain.
You can see that all I define are the 3 identification strings specified and everything else is derived from that.
CREATE TABLE Email (Body TEXT);
INSERT INTO Email (Body)
VALUES
('If you are able to, please correct the issue and resubmit the file.
Date: 10/8/2024
Queue: ENTRY
File Name: TEST_FILE.PDF');
SELECT
-- Extract the string segments we require
SUBSTRING(em.body, c3.DateEndIdx, c3.QueueStartIdx - c3.DateEndIdx) [Date]
, SUBSTRING(em.body, c3.QueueEndIdx, c3.FileNameStartIdx - c3.QueueEndIdx) Queue
, SUBSTRING(em.body, c3.FileNameEndIdx, c3.EndOfText - c3.FileNameEndIdx) FileName
FROM (
-- Convert to VARCHAR in order to use all string functions
SELECT CONVERT(VARCHAR(MAX), Body) Body
FROM Email
) em
-- Capture the strings we are trying to find
CROSS APPLY (
VALUES (
'Date:'
, 'Queue:'
, 'File Name:'
)
) c1 (DateLabel, QueueLabel, FileNameLabel)
-- Find the starts and ends of the strings we are trying to find
CROSS APPLY (
VALUES (
CHARINDEX(c1.DateLabel, em.body)
, CHARINDEX(c1.QueueLabel, em.body)
, CHARINDEX(c1.FileNameLabel, em.body)
)
) c2 (DateIdx, QueueIdx, FileNameIdx)
CROSS APPLY (
VALUES (
c2.DateIdx
, c2.DateIdx + LEN(c1.DateLabel)
, c2.QueueIdx
, c2.QueueIdx + LEN(c1.QueueLabel)
, c2.FileNameIdx
, c2.FileNameIdx + LEN(c1.FileNameLabel)
, len(em.body) + 1
)
) c3 (DateStartIdx, DateEndIdx, QueueStartIdx, QueueEndIdx, FileNameStartIdx, FileNameEndIdx, EndOfText);
Date | Queue | FileName |
---|---|---|
10/8/2024 |
ENTRY |
TEST_FILE.PDF |
Upvotes: 0
Reputation: 455
You can try content parsing with substring and char string. If required apply LTRIM / RTRIM based on requirement.
DECLARE @data NVARCHAR(100) = 'Date: 10/8/2024 Queue: ENTRY File Name: TEST_FILE.PDF';
SELECT
(SUBSTRING(@data, CHARINDEX('Date: ', @data) + 6, CHARINDEX('Queue:', @data) - CHARINDEX('Date: ', @data) - 6)) AS 'Date',
(SUBSTRING(@data, CHARINDEX('Queue: ', @data) + 7, CHARINDEX('File Name:', @data) - CHARINDEX('Queue:', @data) - 7)) AS 'Queue',
(SUBSTRING(@data, CHARINDEX('File Name: ', @data) + 10, LEN(@data))) AS 'File_Name'
Upvotes: -1