PurpleHaze
PurpleHaze

Reputation: 23

Query to find position in a string before keyword and extract data before that position

Our data is in a SQL Server 2022 (v16.0.1000.6) database. I'm using SSMS v20.2.

When a person makes a change to a record, that change is captured in a log table as a string column called LogText. Each time a change is made, the change is appended to this column. The date of the change is captured in a different column.

My problem is trying to parse out the most current change which would be at the end of the string in the LogText column. The following is an example of an inventory record for a part. The DemandQty and OnHandQty have been changed numerous times with the most current changes at the end of the string. I'm thinking that I'd need to start at the end of the string, find the first occurrence of DemandQty and extract the value after the -> characters. Same with OnHandQty.

My question is twofold:

  1. How do I find the position of -> that comes before DemandQty or OnHandQty in the reversed string?
  2. How do I extract the integer value before the arrow in the reversed string?

Example string:

12:01:07  OnHandQty: 10000.00000000 -> 9743.00000000  12:01:07  OnHandQty: 9743.00000000 -> 10000.00000000  15:24:25  DemandQty: 0 -> 257.00000000  15:31:09  OnHandQty: 10000.00000000 -> 9743.00000000  15:31:09  DemandQty: 257.00000000 -> 127.00000000

Expected result: DemandQty = 127, OnHandQty = 9743

Actual result: NULL

This is what I've come up with for the DemandQty, however the @arrowPos variable is returning the position of <- (in the reversed string) AFTER the first DemandQty rather than before.

DECLARE @inputString NVARCHAR(MAX) = '12:01:07  OnHandQty: 10000.00000000 -> 9743.00000000  12:01:07  OnHandQty: 9743.00000000 -> 10000.00000000  15:24:25  DemandQty: 0 -> 257.00000000  15:31:09  OnHandQty: 10000.00000000 -> 9743.00000000  15:31:09  DemandQty: 257.00000000 -> 127.00000000'
 
-- Reverse the input string to find the last occurrence of 'DemandQty' from the end.
DECLARE @reversedString NVARCHAR(MAX) = REVERSE(@inputString);
 
-- Find the position of the first 'DemandQty' in the reversed string.
DECLARE @demandQtyPos INT = CHARINDEX('ytqdnamed', @reversedString) - 1;  -- 'ytqdnamed' is 'DemandQty' reversed.
 
-- Find the position of the '>-' symbol that comes before the first 'DemandQty' in the reversed string.
DECLARE @arrowPos INT = CHARINDEX('>-', @reversedString, @demandQtyPos);
 
-- Extract the value after '>-' and the space.
DECLARE @result NVARCHAR(50) = 
    CASE 
        WHEN @arrowPos > 0 THEN
            -- Extract the substring after '>-' (ignoring the space), and reverse it back to get the correct order.
            REVERSE(SUBSTRING(@reversedString, @arrowPos + 2, CHARINDEX(' ', @reversedString + ' ', @arrowPos + 2) - @arrowPos - 2))
        ELSE 
            NULL
    END;

------Convert the extracted value to an integer.
SELECT CAST(@result AS INT) AS ExtractedIntegerValue; 

Upvotes: 1

Views: 94

Answers (3)

ValNik
ValNik

Reputation: 5966

For SQL Server version without STRING_SPLIT

DECLARE @inputString NVARCHAR(MAX) = '12:01:07  OnHandQty: 10000.00000000 -> 9743.00000000  12:01:07  OnHandQty: 9743.00000000 -> 10000.00000000  15:24:25  DemandQty: 0 -> 257.00000000  15:31:09  OnHandQty: 10000.00000000 -> 9743.00000000  15:31:09  DemandQty: 257.00000000 -> 127.00000000'
Declare @SearchName NVARCHAR(100)='DemandQty';

select substring(s1,1,charindex(':',s1)-1) sName
  ,substring(s1,charindex(':',s1)+len('->'),charindex('->',s1)-1-charindex(':',s1)-2)fromValue
  ,substring(s1,charindex('->',s1)++len('->')+1
        ,case when charindex(' ',s1,charindex('->',s1)+len('->')+1)>0 then
           charindex(' ',s1,charindex('->',s1)+len('->')+1)-charindex('->',s1)-2
         else len(s1) end
        ) toValue
from(
select 
  case when charindex(@SearchName+':',s0)>0 then
    substring(s0
   ,len(s0)-charindex(':'+reverse(@SearchName),reverse(s0))-len(@SearchName)+1
   ,len(s0)) 
  end s1
  from (select trim(replace(@inputString,'  ',' ')) s0)a
)b
sName fromValue toValue
DemandQty 257.00000000 127.00000000

fiddle

Upvotes: 0

ValNik
ValNik

Reputation: 5966

String parsing does not seem to me to be a very promising approach, since the number of logging values is practically unlimited. I suggest you look at the query option using STRIG_SPLIT, if this is applicable in your version of SQL Server.

Your log entry has a clear structure.

idx part
1 12:01:07
2 OnHandQty:
3 10000.00000000
4 ->
5 9743.00000000

We can use string_split to split the entire string into parts (removing extra spaces). Next, combine into a group of 5 parts, put them back together.

See example

DECLARE @inputString NVARCHAR(MAX) = '12:01:07  OnHandQty: 10000.00000000 -> 9743.00000000  12:01:07  OnHandQty: 9743.00000000 -> 10000.00000000  15:24:25  DemandQty: 0 -> 257.00000000  15:31:09  OnHandQty: 10000.00000000 -> 9743.00000000  15:31:09  DemandQty: 257.00000000 -> 127.00000000'
Declare @searchName NVARCHAR(100)='DemandQty';

select oldValue,newValue
from(
select *
    ,row_number()over(partition by logname order by logtime )rn
from(
  select grp
    ,max(case when idxn=1 then cast(part as time) end) logtime
    ,max(case when idxn=2 then left(part,len(part)-1) end) logName
    ,max(case when idxn=3 then part end) oldValue
    ,max(case when idxn=0 then part end) newValue
  from(
     select s.ordinal as idx, value as part 
       ,(ordinal+4)/5 as grp
       ,ordinal%5 as idxn
     from string_split(replace(@inputString,'  ',' '),' ',1)s
   )a
   group by grp
 )b
)c
where logName=@searchName and rn=1 

oldValue newValue
0 257.00000000

fiddle

And query parsing all rows from table

create table test (id int, logstring varchar(max));
insert into test values
 (1,'12:01:07  OnHandQty: 10000.00000000 -> 9743.00000000  12:01:07  OnHandQty: 9743.00000000 -> 10000.00000000  15:24:25  DemandQty: 0 -> 257.00000000  15:31:09  OnHandQty: 10000.00000000 -> 9743.00000000  15:31:09  DemandQty: 257.00000000 -> 127.00000000')
;
select *
from(
select *
    ,row_number()over(partition by id,logname order by logtime )rn
from(
select id,grp
  ,max(case when idxn=1 then cast(part as time) end) logtime
  ,max(case when idxn=2 then left(part,len(part)-1) end) logName
  ,max(case when idxn=3 then part end) oldValue
  ,max(case when idxn=0 then part end) newValue
from(
select t.id,s.ordinal as idx, value as part 
  ,(ordinal+4)/5 as grp
  ,ordinal%5 as idxn
from test t
cross apply(select * from string_split(replace(logstring,'  ',' '),' ',1))s
)a
group by id,grp
)b
)c
where rn=1
id grp logtime logName oldValue newValue rn
1 3 15:24:25.0000000 DemandQty 0 257.00000000 1
1 1 12:01:07.0000000 OnHandQty 10000.00000000 9743.00000000 1

Before grouping we have

  • string parts (5 for every log),
  • grp - this is number of update
  • idxn - number of part in grp

Table 2.

id idx part grp idxn
1 1 12:01:07 1 1
1 2 OnHandQty: 1 2
1 3 10000.00000000 1 3
1 4 -> 1 4
1 5 9743.00000000 1 0
1 6 12:01:07 2 1
1 7 OnHandQty: 2 2
1 8 9743.00000000 2 3
1 9 -> 2 4
1 10 10000.00000000 2 0
1 11 15:24:25 3 1
1 12 DemandQty: 3 2
1 13 0 3 3
1 14 -> 3 4
1 15 257.00000000 3 0
1 16 15:31:09 4 1
1 17 OnHandQty: 4 2
1 18 10000.00000000 4 3
1 19 -> 4 4
1 20 9743.00000000 4 0
1 21 15:31:09 5 1
1 22 DemandQty: 5 2
1 23 257.00000000 5 3
1 24 -> 5 4
1 25 127.00000000 5 0

fiddle

Upvotes: 0

Thom A
Thom A

Reputation: 95949

I would, honestly, make best attempts to normalise your data, and then work with that data. This makes best attempts to do so, and works for the data you have. Most likely you then want a Top 1 in each group in that result set.

DECLARE @YourString varchar(8000) = '12:01:07  OnHandQty: 10000.00000000 -> 9743.00000000  12:01:07  OnHandQty: 9743.00000000 -> 10000.00000000  15:24:25  DemandQty: 0 -> 257.00000000  15:31:09  OnHandQty: 10000.00000000 -> 9743.00000000  15:31:09  DemandQty: 257.00000000 -> 127.00000000';

WITH Split AS(
    SELECT TRIM(SS.value) AS value,
           SS.ordinal,
           (ss.ordinal+1) / 2 AS item
    FROM STRING_SPLIT(REPLACE(@YourString,'  ','|'),'|',1) SS),
Pivoted AS(
    SELECT S.item,
           MAX(CASE WHEN S.value LIKE '[0-9][0-9]:[0-9][0-9]:[0-9][0-9]' THEN S.value END) AS EventTime,
           MAX(CASE WHEN S.value NOT LIKE '[0-9][0-9]:[0-9][0-9]:[0-9][0-9]' THEN S.value END) AS EventValue
    FROM Split S
    GROUP BY S.item) --Will also need to group on a unique value if this is coming from a table, not a variable
SELECT P.EventTime,
       V.EventName,
       MAX(CASE SS.ordinal WHEN 1 THEN TRIM(SS.value) END) AS FromValue,
       MAX(CASE SS.ordinal WHEN 2 THEN TRIM(SS.value) END) AS ToValue
FROM Pivoted P  
     CROSS APPLY (VALUES(NULLIF(CHARINDEX(':',P.EventValue),0)))CI(I)
     CROSS APPLY (VALUES(LEFT(P.EventValue,CI.I-1),TRIM(STUFF(P.EventValue,1,CI.I,''))))V(EventName,EventValue)
     CROSS APPLY STRING_SPLIT(REPLACE(V.EventValue,'->','|'),'|',1) SS
GROUP BY P.EventTime,
         P.item,
         V.EventName;--Will also need to group on a unique value if this is coming from a table, not a variable

And, as mentioned, if you needed to do a get the "last" values:

WITH Split AS(
    SELECT TRIM(SS.value) AS value,
           SS.ordinal,
           (ss.ordinal+1) / 2 AS item
    FROM STRING_SPLIT(REPLACE(@YourString,'  ','|'),'|',1) SS),
Pivoted AS(
    SELECT S.item,
           MAX(CASE WHEN S.value LIKE '[0-9][0-9]:[0-9][0-9]:[0-9][0-9]' THEN S.value END) AS EventTime,
           MAX(CASE WHEN S.value NOT LIKE '[0-9][0-9]:[0-9][0-9]:[0-9][0-9]' THEN S.value END) AS EventValue
    FROM Split S
    GROUP BY S.item), --Will also need to group on a unique value if this is coming from a table, not a variable
RNs AS(
    SELECT P.EventTime,
           V.EventName,
           MAX(CASE SS.ordinal WHEN 1 THEN TRIM(SS.value) END) AS FromValue,
           MAX(CASE SS.ordinal WHEN 2 THEN TRIM(SS.value) END) AS ToValue,
           ROW_NUMBER() OVER (PARTITION BY V.EventName ORDER BY P.EventTime DESC, P.item DESC) AS RN --May need to aklter PARTITION BY is using a column
    FROM Pivoted P  
         CROSS APPLY (VALUES(NULLIF(CHARINDEX(':',P.EventValue),0)))CI(I)
         CROSS APPLY (VALUES(LEFT(P.EventValue,CI.I-1),TRIM(STUFF(P.EventValue,1,CI.I,''))))V(EventName,EventValue)
         CROSS APPLY STRING_SPLIT(REPLACE(V.EventValue,'->','|'),'|',1) SS
    GROUP BY P.EventTime,
             P.item,
             V.EventName) --Will also need to group on a unique value if this is coming from a table, not a variable
SELECT R.EventName,
       R.ToValue
FROM RNs R
WHERE R.RN = 1;

But, of course, as I stated in the comments, data like you have here should never be getting to the database. Fix the design; don't store your data like this. As you have found, working with it is a pain... As a temporary solution, you could use a VIEW, but the long term goal should be fixing things. Future you will thank you.

Upvotes: 1

Related Questions