Stpete111
Stpete111

Reputation: 3447

T-SQL Insert Trigger - extract part of text for insert

I have data blocks being sent to a SQL Server 2017 table - for purposes of minimally-reproducible example, let's say each of the data blocks look something like this:

2:32:34 PM  #0  Info    Entering agent monitor. 
2:32:34 PM  #1  Info    Agent
2:32:34 PM  #1  Warning Error: the pages could not be loaded...
2:32:34 PM  #1  Alert   Setting user
2:32:34 PM  #2  Warning Loading data. 
2:32:34 PM  #2  Info    Setting first browser proxy. Proxy is None 
2:32:34 PM  #3  Error   One or more threads threw Error #123
2:32:35 PM  #3  Info    Loading Data

That's the format the data is getting written to the field in. The actual data blocks are much larger than this, but the important bit to note is the line containing the status Error (line 7). As opposed to inserting the entirety of the data block to a field called DataText, I only want to insert the Error line, and without the leading timestamp or # (i.e. the value of the DataText field for the above would be Error One or more threads threw Error #123. Note that other "Error" message on the third line (actually only considered a warning for our purposes). We do NOT want to pick this up. Only where the actual status is "Error."

With my limited time in SQL, I feel I need to go for some combination of SUBSTRING and maybe PATINDEX? I'm having a little trouble with this one.

This is what I have so far:

CREATE TRIGGER [dbo].[LogParseTrigger] 
ON [dbo].[Logs] 
INSTEAD OF INSERT
AS 
BEGIN
    SET NOCOUNT ON
    
    INSERT INTO dbo.Logs (Id, DataText)
        SELECT 
            i.Id, j.DataText
        FROM 
            Inserted i
        OUTER APPLY 
            (SELECT
                 CAST(SUBSTRING(SUBSTRING(i.DataText, PATINDEX('%#[0-9] Error%', i.DataText), 999), 1, PATINDEX('%???%', SUBSTRING(i.DataText, PATINDEX('%???%', i.DataText), 999) + 'x') - 1) AS nvarchar) AS [DataText]
             FROM 
                 Inserted i) j
END

Note the question marks in the 2nd and 3rd instances of PATINDEX - this is where I'm getting totally lost. Am I even on the right track?

Any suggestions, even if it's a different approach from Substring/PatIndex, would be greatly appreciated.

EDIT #1

It's important to note, with regard to the finding the end of the pattern, it is possible that the error message could extend to a new line. But it would never contain a full timestamp within the error message that we need to capture.

Upvotes: 2

Views: 87

Answers (1)

Alex
Alex

Reputation: 5157

The following solution assumes:

  • the number of characters between start of the line and # is always 13
  • the number of characters between # and beginning of the message text is always 12
  • line breaks are made up of two characters
  • only one error message (the first found) is to be extracted

Explanation of how it works is in the code comments.

DECLARE @a NVARCHAR( MAX ) =
'2:32:34 PM  #0  Info    Entering agent monitor. 
2:32:34 PM  #1  Info    Agent
2:32:34 PM  #1  Warning Error: the pages could not be loaded...
2:32:34 PM  #1  Alert   Setting user
2:32:34 PM  #2  Warning Loading data. 
2:32:34 PM  #2  Info    Setting first browser proxy. Proxy is None 
2:32:34 PM  #3  Error   One or more threads threw Error #123
2:32:35 PM  #3  Info    Loading Data'

-- Third: Check if the "Error" is the last entry and either use ErrorMessageEndPosition or some large number
SELECT LEFT( ErrorMessageStart,
        CASE
            -- When another entry is found, calculate the end of the message based on the position of the next entry
            -- 15 is the number of characters from # to the end of the Error Message (by counting backwards). This also removes line break characters.
            WHEN ISNULL( ErrorMessageEndPosition, 0 ) >= 15 THEN ErrorMessageEndPosition - 15
            -- A large number to make sure that the error message text is not truncated.
            ELSE 10000
        END
            ) AS [DataText]
FROM
    -- Second: Find the position of the next occurrence of "#[0-9]  ". The double space after [0-9] is important as it will remove incorrect matches (e.g. #123)
    ( SELECT ErrorMessageStart, PATINDEX('%#[0-9]  %', ErrorMessageStart ) AS ErrorMessageEndPosition
    FROM
        -- First: Find beginning of the Error message and extract it together with text that follows it.
        -- 12 is the number of characters from # sign to beginning of the actual message
        ( SELECT SUBSTRING( @a, PATINDEX('%#[0-9]  Error%', @a ) + 12, 100000 ) AS ErrorMessageStart ) AS ErrStart
    ) AS ErrEnd

To incorporate this into your trigger simply add my code into the OUTER APPLY part of your trigger replacing @a with i.DataText.

Upvotes: 2

Related Questions