OfficerSpock
OfficerSpock

Reputation: 1

Parse TEXT datatype column to grab values based on TAGS

I have the following string stored in a TEXT datatype which I want to extract the values for

Date:  
Queue:  
File Name: 

and return them in their own columns.

STRING:

If you are able to, please correct the issue and resubmit the file.        

Date: 10/8/2024  
Queue: ENTRY 
File Name: TEST_FILE.PDF

Columns:

Date           Queue          File Name
-------------------------------------------
10/8/2024      ENTRY          TEST_FILE.PDF

I have come up with the following code but have been unable to exclude additional information that comes back.

I get the following data returned:

Date                      Queue                  File Name
--------------------------------------------------------------
10/8/2024    Queue:       ENTRY    File Na       TEST_FILE.PDF
SELECT
    SUBSTRING(CAST(em.body AS NVARCHAR(300)), 
              CHARINDEX('Date:', CAST(em.body AS NVARCHAR(300))) + 6, 
              (CHARINDEX('Queue:', CAST(em.body AS NVARCHAR(300))) - CHARINDEX('Date:', CAST(em.body AS NVARCHAR(300))))) 'Date',
    SUBSTRING(CAST(em.body AS NVARCHAR(300)), 
              CHARINDEX('Queue:', CAST(em.body AS NVARCHAR(300))) + 7, 
              (CHARINDEX('File Name:', CAST(em.body AS NVARCHAR(300))) - CHARINDEX('Queue:', CAST(em.body AS NVARCHAR(300))))) 'Queue',
    RIGHT(CAST(em.body AS NVARCHAR(300)), (LEN(CAST(em.body AS NVARCHAR(300))) - 10) - CHARINDEX('File Name:', CAST(em.body AS NVARCHAR(300)))) 'File Name'
FROM
    email em WITH(NOLOCK)

I know I need to decrease the length value for the SUBSTRING calls, but no matter where I put in a value to decrease them, I get the following error:

Msg 537, Level 16, State 3, Line 2
Invalid length parameter passed to the LEFT or SUBSTRING function

Upvotes: 0

Views: 61

Answers (3)

John Cappelletti
John Cappelletti

Reputation: 81905

Just another option using a bit of JSON

Example

Declare @YourTable Table (id int,[SomeCol] varchar(550))  Insert Into @YourTable Values 
 (1,'Date: 10/8/2024 Queue: ENTRY File Name: TEST_FILE.PDF'),
 (2,'Date: 11/9/2024 Queue: ENTRY File Name: OtherFileName.PDF')

Select ID
      ,B.* 
 From @YourTable A
  Cross Apply ( 
   Select Date  = max(case when value='Date:'  then LV end)
         ,Queue = max(case when value='Queue:' then LV end)
         ,FName = max(case when value='Name:'  then LV end)
    From (Select *
                ,LV = lead(value,1) over (order by [key])
          From OpenJSON ('["'+replace(string_escape([SomeCol],'json'),' ','","')+'"]')
         ) B1
  )  B
 

Results

ID      Date        Queue   FName
1       10/8/2024   ENTRY   TEST_FILE.PDF
2       11/9/2024   ENTRY   OtherFileName.PDF

Upvotes: 0

Dale K
Dale K

Reputation: 27177

If you find yourself with no choice but to do messy string extractions then the trick is to methodically build up your logic testing each bit at a time.

Personally I like to use a DRY approach, even though its not typical for SQL, because it reduces the chance of the mistakes which can occur when you repeat logic. This can be done with use of the CROSS APPLY operator.

This is far from the most concise, but is easier (IMO) to build and maintain.

You can see that all I define are the 3 identification strings specified and everything else is derived from that.

CREATE TABLE Email (Body TEXT);
INSERT INTO Email (Body)
VALUES
('If you are able to, please correct the issue and resubmit the file.        

Date: 10/8/2024  
Queue: ENTRY 
File Name: TEST_FILE.PDF');

SELECT
  -- Extract the string segments we require
  SUBSTRING(em.body, c3.DateEndIdx, c3.QueueStartIdx - c3.DateEndIdx) [Date]
  , SUBSTRING(em.body, c3.QueueEndIdx, c3.FileNameStartIdx - c3.QueueEndIdx) Queue
  , SUBSTRING(em.body, c3.FileNameEndIdx, c3.EndOfText - c3.FileNameEndIdx) FileName
FROM (
  -- Convert to VARCHAR in order to use all string functions
  SELECT CONVERT(VARCHAR(MAX), Body) Body
  FROM Email
) em
-- Capture the strings we are trying to find
CROSS APPLY (
  VALUES (
    'Date:'
    , 'Queue:'
    , 'File Name:'
  )
) c1 (DateLabel, QueueLabel, FileNameLabel)
-- Find the starts and ends of the strings we are trying to find
CROSS APPLY (
  VALUES (
    CHARINDEX(c1.DateLabel, em.body)
    , CHARINDEX(c1.QueueLabel, em.body)
    , CHARINDEX(c1.FileNameLabel, em.body)
  )
) c2 (DateIdx, QueueIdx, FileNameIdx)
CROSS APPLY (
  VALUES (
    c2.DateIdx
    , c2.DateIdx + LEN(c1.DateLabel)
    , c2.QueueIdx
    , c2.QueueIdx + LEN(c1.QueueLabel)
    , c2.FileNameIdx
    , c2.FileNameIdx + LEN(c1.FileNameLabel)
    , len(em.body) + 1
  )
) c3 (DateStartIdx, DateEndIdx, QueueStartIdx, QueueEndIdx, FileNameStartIdx, FileNameEndIdx, EndOfText);
Date Queue FileName
10/8/2024
ENTRY
TEST_FILE.PDF

db<>fiddle

Upvotes: 0

Krupa
Krupa

Reputation: 455

You can try content parsing with substring and char string. If required apply LTRIM / RTRIM based on requirement.

DECLARE @data NVARCHAR(100) = 'Date: 10/8/2024 Queue: ENTRY File Name: TEST_FILE.PDF';

SELECT 
    (SUBSTRING(@data, CHARINDEX('Date: ', @data) + 6, CHARINDEX('Queue:', @data) - CHARINDEX('Date: ', @data) - 6)) AS 'Date',
    (SUBSTRING(@data, CHARINDEX('Queue: ', @data) + 7, CHARINDEX('File Name:', @data) - CHARINDEX('Queue:', @data) - 7)) AS 'Queue',
    (SUBSTRING(@data, CHARINDEX('File Name: ', @data) + 10, LEN(@data))) AS 'File_Name'

Upvotes: -1

Related Questions