Monu M
Monu M

Reputation: 1

How to Parse text in one column to different columns in TSQL?

I have data in the following format in a column:

Follow-up Date (use start date from OTO): 8-15-2024 
Contract Type (Competitive Win, Meter, New Product Line, Not Eligible): Meter 
Go Live Date (Confirm with TSE): 
Number of attempts to schedule (1, 2, 3, No reponse - collateral sent):  2  
Call completion/collateral sent date:  
Attendees (Meter Specialist, Meter Efficiency Partner, TSE): 
Notes:

How to put it in different columns? For example - Follow-up Date (use start date from OTO) will be one column with value as 8-15-2024, Contract Type (Competitive Win, Meter, New Product Line, Not Eligible) will be other column with value as Meter and so on.

I tried creating SQL with SUBSTRING() and CHARINDEX() and was successful to parse all columns except the last Notes: column. How to do that?

Query -

SELECT chr_id,note,
      TRIM(SUBSTRING(note,
           CHARINDEX('Follow-up Date (use start date from OTO):',note)+41, 
           CHARINDEX('Contract Type (Competitive Win, Meter, New Product Line, Not Eligible):',note,
                     CHARINDEX('Follow-up Date (use start date from OTO):',note)+41)-
                     CHARINDEX('Follow-up Date (use start date from OTO):',note)-41)
          ) AS 'Follow-up Date',
      TRIM(
      SUBSTRING(
                 note,
                 CHARINDEX('Contract Type (Competitive Win, Meter, New Product Line, Not Eligible):',note)+71, 
                 CHARINDEX('Go Live Date (Confirm with TSE):',
                            note,CHARINDEX('Contract Type (Competitive Win, Meter, New Product Line, Not Eligible):',note)+71)-
                 CHARINDEX('Contract Type (Competitive Win, Meter, New Product Line, Not Eligible):',note)-
                 71)) AS 'Contract Type',
      TRIM(
      SUBSTRING(
                 note,
                 CHARINDEX('Go Live Date (Confirm with TSE):',note)+32, 
                 CHARINDEX('Number of attempts to schedule (1, 2, 3, No reponse - collateral sent):',
                            note,CHARINDEX('Go Live Date (Confirm with TSE):',note)+32)-
                 CHARINDEX('Go Live Date (Confirm with TSE):',note)-
                 32)) AS 'Go Live Date',
      TRIM(
      SUBSTRING(
                 note,
                 CHARINDEX('Number of attempts to schedule (1, 2, 3, No reponse - collateral sent):',note)+71, 
                 CHARINDEX('Call completion/collateral sent date:',
                            note,CHARINDEX('Number of attempts to schedule (1, 2, 3, No reponse - collateral sent):',note)+71)-
                 CHARINDEX('Number of attempts to schedule (1, 2, 3, No reponse - collateral sent):',note)-
                 71)) AS 'Number of attempts to schedule',
      TRIM(
      SUBSTRING(
                 note,
                 CHARINDEX('Call completion/collateral sent date:',note)+37, 
                 CHARINDEX('Attendees (Meter Specialist, Meter Efficiency Partner, TSE):',
                            note,CHARINDEX('Call completion/collateral sent date:',note)+37)-
                 CHARINDEX('Call completion/collateral sent date:',note)-
                 37)) AS 'Call completion/collateral sent date',
      TRIM(
      SUBSTRING(
                 note,
                 CHARINDEX('Attendees (Meter Specialist, Meter Efficiency Partner, TSE):',note)+60, 
                 CHARINDEX('Notes:',
                            note,CHARINDEX('Attendees (Meter Specialist, Meter Efficiency Partner, TSE):',note)+60)-
                 CHARINDEX('Attendees (Meter Specialist, Meter Efficiency Partner, TSE):',note)-
                 60)) AS 'Attendees',
   --   TRIM(
      --SUBSTRING(
   --              note,
            --   CHARINDEX('Notes:',note)+6, 
   --              LEN(note)-
            --   CHARINDEX('Notes:',note)-
      --           6)) AS 'Notes'

FROM table1

Upvotes: 0

Views: 52

Answers (1)

T N
T N

Reputation: 10204

First, SQL Server has very poor text parsing capabilities. You would be much better off using an environment and language that supports regular expressions or other text parsing capabilities, so that you can use features to parse your data much more easily.

It would have helped if you had provided the error that you were getting for the last column calculation: "Invalid length parameter passed to the LEFT or SUBSTRING function."

The reason that you are getting this error is that you have an off-by-one error in your final substring length calculation. For an empty notes field, you are calculating a length of -1, triggering the error. For non-empty notes, you are calculating one less than the required length.

If you think of the source data as having one more imaginary delimiter just past the end of the notes field (past the end of the string), the position of that imaginary delimiter would be LEN(note) + 1, not LEN(note). Your corrected substring length calculation should therefore be LEN(note) + 1 - CHARINDEX('Notes:',note) - 6, and the corrected notes extract would be:

      TRIM(
      SUBSTRING(
                 note,
                 CHARINDEX('Notes:', note) + 6, 
                 LEN(note) + 1 -
                 CHARINDEX('Notes:',note) -
                 6)) AS Notes

However, there is a simpler way. If you just want to extract everything following "Notes:" to the end of the string, you can just overstate the length as LEN(note) and the SUBSTRING() function will automatically reduce that down to match the availabable remaining string length.

      TRIM(
      SUBSTRING(
                 note,
                 CHARINDEX('Notes:', note) + 6, 
                 LEN(note)
                 )) AS Notes

You can even use the STUFF() function to remove everything up through the "Notes:" string, leaving the remaining text.

      TRIM(STUFF(note, 1, CHARINDEX('Notes:', note) + 5, '')) AS Notes

Note the + 5 instead of the + 6, since we want to remove up to the last character in "Notes:", not the character after it.

See this db<>fiddle for a demo that includes each of the above techniques.

You likely still have some work to do. If your source data contains line breaks, the CR/LF (CHAR(13) and CHAR(10)) will not be trimmed by the TRIM(...) function as written. However, you can try the syntax TRIM(' ' + CHAR(13) + CHAR(10) FROM ...) to trim spaces and line breaks.

You should also likely convert the data to the proper data types like INT or DATE where appropriate.

Upvotes: 1

Related Questions