user194076
user194076

Reputation: 9027

replace value in varchar(max) field with join

I have a table that contains text field with placeholders. Something like this:

Row Notes  
1.  This is some notes ##placeholder130## this ##myPlaceholder##, #oneMore#. End.
2.  Second row...just a ##test#.   

(This table contains about 1-5k rows on average. Average number of placeholders in one row is 5-15).

Now, I have a lookup table that looks like this:

Name             Value
placeholder130    Dog
myPlaceholder     Cat
oneMore           Cow
test              Horse   

(Lookup table will contain anywhere from 10k to 100k records)

I need to find the fastest way to join those placeholders from strings to a lookup table and replace with value. So, my result should look like this (1st row):

This is some notes Dog this Cat, Cow. End.

What I came up with was to split each row into multiple for each placeholder and then join it to lookup table and then concat records back to original row with new values, but it takes around 10-30 seconds on average.

Upvotes: 12

Views: 2694

Answers (7)

OzrenTkalcecKrznaric
OzrenTkalcecKrznaric

Reputation: 5646

And now for some recursive CTE.

If your indexes are correctly set up, this one should be very fast or very slow. SQL Server always surprises me with performance extremes when it comes to the r-CTE...

;WITH T AS (
  SELECT
    Row,
    StartIdx = 1,                                  -- 1 as first starting index
    EndIdx = CAST(patindex('%##%', Notes) as int), -- first ending index
    Result = substring(Notes, 1, patindex('%##%', Notes) - 1)
                                                   -- (first) temp result bounded by indexes
  FROM PlaceholderTable -- **this is your source table**
  UNION ALL
  SELECT
    pt.Row,
    StartIdx = newstartidx,                        -- starting index (calculated in calc1)
    EndIdx = EndIdx + CAST(newendidx as int) + 1,  -- ending index (calculated in calc4 + total offset)
    Result = Result + CAST(ISNULL(newtokensub, newtoken) as nvarchar(max))
                                                   -- temp result taken from subquery or original
  FROM 
    T
    JOIN PlaceholderTable pt -- **this is your source table**
      ON pt.Row = T.Row
    CROSS APPLY(
      SELECT newstartidx = EndIdx + 2              -- new starting index moved by 2 from last end ('##')
    ) calc1
    CROSS APPLY(
      SELECT newtxt = substring(pt.Notes, newstartidx, len(pt.Notes))
                                                   -- current piece of txt we work on
    ) calc2
    CROSS APPLY(
      SELECT patidx = patindex('%##%', newtxt)     -- current index of '##'
    ) calc3
    CROSS APPLY(
      SELECT newendidx = CASE 
        WHEN patidx = 0 THEN len(newtxt) + 1
        ELSE patidx END                            -- if last piece of txt, end with its length
    ) calc4
    CROSS APPLY(
      SELECT newtoken = substring(pt.Notes, newstartidx, newendidx - 1)
                                                   -- get the new token
    ) calc5
    OUTER APPLY(
      SELECT newtokensub = Value
      FROM LookupTable
      WHERE Name = newtoken                        -- substitute the token if you can find it in **your lookup table**
    ) calc6
  WHERE newstartidx + len(newtxt) - 1  <= len(pt.Notes)  
                                                   -- do this while {new starting index} + {length of txt we work on} exceeds total length
) 
,lastProcessed AS (
  SELECT 
    Row, 
    Result,
    rn = row_number() over(partition by Row order by StartIdx desc)
  FROM T 
)                                                  -- enumerate all (including intermediate) results
SELECT *
FROM lastProcessed
WHERE rn = 1                                       -- filter out intermediate results (display only last ones)

Upvotes: 0

bvr
bvr

Reputation: 4826

Try this

;WITH CTE (org, calc, [Notes], [level]) AS
(
    SELECT [Notes], [Notes], CONVERT(varchar(MAX),[Notes]), 0 FROM PlaceholderTable

    UNION ALL

    SELECT  CTE.org, CTE.[Notes],
        CONVERT(varchar(MAX), REPLACE(CTE.[Notes],'##' + T.[Name] + '##', T.[Value])), CTE.[level] + 1
    FROM    CTE
    INNER JOIN LookupTable T ON CTE.[Notes] LIKE '%##' + T.[Name] + '##%'

)

SELECT DISTINCT org, [Notes], level FROM CTE
WHERE [level] = (SELECT MAX(level) FROM CTE c WHERE CTE.org = c.org)

SQL FIDDLE DEMO

Check the below devioblog post for reference

devioblog post

Upvotes: 4

Gene
Gene

Reputation: 47000

To get speed, you can preprocess the note templates into a more efficient form. This will be a sequence of fragments, with each ending in a substitution. The substitution might be NULL for the last fragment.

Notes
Id     FragSeq    Text                    SubsId
1      1          'This is some notes '   1
1      2          ' this '                2
1      3          ', '                    3
1      4          '. End.'                null
2      1          'Second row...just a '  4
2      2          '.'                     null

Subs
Id  Name               Value
1   'placeholder130'   'Dog'
2   'myPlaceholder'    'Cat'
3   'oneMore'          'Cow'
4   'test'             'Horse'  

Now we can do the substitutions with a simple join.

SELECT Notes.Text + COALESCE(Subs.Value, '') 
FROM Notes LEFT JOIN Subs 
ON SubsId = Subs.Id WHERE Notes.Id = ?
ORDER BY FragSeq

This produces a list of fragments with substitutions complete. I am not an MSQL user, but in most dialects of SQL you can concatenate these fragments in a variable quite easily:

DECLARE @Note VARCHAR(8000)
SELECT @Note = COALESCE(@Note, '') + Notes.Text + COALSCE(Subs.Value, '') 
FROM Notes LEFT JOIN Subs 
ON SubsId = Subs.Id WHERE Notes.Id = ?
ORDER BY FragSeq

Pre-processing a note template into fragments will be straightforward using the string splitting techniques of other posts.

Unfortunately I'm not at a location where I can test this, but it ought to work fine.

Upvotes: 1

nathan_jr
nathan_jr

Reputation: 9292

I second the comment that tsql is just not suited for this operation, but if you must do it in the db here is an example using a function to manage the multiple replace statements.

Since you have a relatively small number of tokens in each note (5-15) and a very large number of tokens (10k-100k) my function first extracts tokens from the input as potential tokens and uses that set to join to your lookup (dbo.Token below). It was far too much work to look for an occurrence of any of your tokens in each note.

I did a bit of perf testing using 50k tokens and 5k notes and this function runs really well, completing in <2 seconds (on my laptop). Please report back how this strategy performs for you.

note: In your example data the token format was not consistent (##_#, ##_##, #_#), I am guessing this was simply a typo and assume all tokens take the form of ##TokenName##.

--setup
    if object_id('dbo.[Lookup]') is not null
        drop table dbo.[Lookup];
    go
    if object_id('dbo.fn_ReplaceLookups') is not null
        drop function dbo.fn_ReplaceLookups;
    go

    create table dbo.[Lookup] (LookupName varchar(100) primary key, LookupValue varchar(100));
    insert into dbo.[Lookup]
        select '##placeholder130##','Dog' union all
        select '##myPlaceholder##','Cat' union all
        select '##oneMore##','Cow' union all
        select '##test##','Horse';
    go

    create function [dbo].[fn_ReplaceLookups](@input varchar(max))
    returns varchar(max)
    as
    begin

        declare @xml xml;
        select @xml = cast(('<r><i>'+replace(@input,'##' ,'</i><i>')+'</i></r>') as xml);

        --extract the potential tokens
        declare @LookupsInString table (LookupName varchar(100) primary key);
        insert into @LookupsInString
            select  distinct '##'+v+'##'
            from    (   select  [v] = r.n.value('(./text())[1]', 'varchar(100)'),
                                [r] = row_number() over (order by n)
                        from    @xml.nodes('r/i') r(n)
                    )d(v,r)
            where   r%2=0;

        --tokenize the input
        select  @input = replace(@input, l.LookupName, l.LookupValue)
        from    dbo.[Lookup] l
        join    @LookupsInString lis on 
                l.LookupName = lis.LookupName;

        return @input;
    end
    go          
    return            

--usage
    declare @Notes table ([Id] int primary key, notes varchar(100));
    insert into @Notes
        select 1, 'This is some notes ##placeholder130## this ##myPlaceholder##, ##oneMore##. End.' union all
        select 2, 'Second row...just a ##test##.';

    select  *,
            dbo.fn_ReplaceLookups(notes)
    from    @Notes;

Returns:

Tokenized
--------------------------------------------------------
This is some notes Dog this Cat, Cow. End.
Second row...just a Horse.

Upvotes: 4

Mikael Eriksson
Mikael Eriksson

Reputation: 138990

You could try to split the string using a numbers table and rebuild it with for xml path.

select (
       select coalesce(L.Value, T.Value)
       from Numbers as N
         cross apply (select substring(Notes.notes, N.Number, charindex('##', Notes.notes + '##', N.Number) - N.Number)) as T(Value)
         left outer join Lookup as L
           on L.Name = T.Value
       where N.Number <= len(notes) and
             substring('##' + notes, Number, 2) = '##'
       order by N.Number
       for xml path(''), type
       ).value('text()[1]', 'varchar(max)')
from Notes

SQL Fiddle

I borrowed the string splitting from this blog post by Aaron Bertrand

Upvotes: 9

Luis LL
Luis LL

Reputation: 2993

I really don't know how it will perform with 10k+ of lookups. how does the old dynamic SQL performs?

DECLARE @sqlCommand  NVARCHAR(MAX)
SELECT @sqlCommand  = N'PlaceholderTable.[Notes]'

SELECT @sqlCommand  = 'REPLACE( ' + @sqlCommand  + 
                      ', ''##' + LookupTable.[Name] + '##'', ''' + 
                      LookupTable.[Value] + ''')'  
FROM LookupTable

SELECT @sqlCommand  = 'SELECT *, ' + @sqlCommand  + ' FROM PlaceholderTable'

EXECUTE sp_executesql @sqlCommand

Fiddle demo

Upvotes: 0

Andomar
Andomar

Reputation: 238246

SQL Server is not very fast with string manipulation, so this is probably best done client-side. Have the client load the entire lookup table, and replace the notes as they arrived.

Having said that, it can of course be done in SQL. Here's a solution with a recursive CTE. It performs one lookup per recursion step:

; with  Repl as
        (
        select  row_number() over (order by l.name) rn
        ,       Name
        ,       Value
        from    Lookup l
        )
,       Recurse as
        (
        select  Notes
        ,       0 as rn
        from    Notes
        union all
        select  replace(Notes, '##' + l.name + '##', l.value)
        ,       r.rn + 1
        from    Recurse r
        join    Repl l
        on      l.rn = r.rn + 1
        )
select  *
from    Recurse
where   rn = 
        (
        select  count(*)
        from    Lookup
        )
option  (maxrecursion 0)

Example at SQL Fiddle.

Another option is a while loop to keep replacing lookups until no more are found:

declare @notes table (notes varchar(max))

insert  @notes
select  Notes
from    Notes

while 1=1
    begin

    update  n
    set     Notes = replace(n.Notes, '##' + l.name + '##', l.value)
    from    @notes n
    outer apply
            (
            select  top 1 Name
            ,       Value
            from    Lookup l
            where   n.Notes like '%##' + l.name + '##%'
            ) l
    where   l.name is not null

    if @@rowcount = 0
        break
    end   

select  *
from    @notes

Example at SQL Fiddle.

Upvotes: 6

Related Questions