salvationishere
salvationishere

Reputation: 3511

How to create a TSQL Replace table function?

I am using SSMS 2008 and I have the following Scalar function to take a text string and remove all metatags from Microsoft Word. The tags are enclosed in "<...>" and there can be any number of tags / record in one column.

I created a scalar function based off this code below to update each row in this column. But this scalar function takes a super long time to complete. Would a table function version of this be faster? If so, how could I rewrite this function to make it table?

WHILE PATINDEX( '%[%]%', @str ) > 0 
    SET @str = REPLACE( @str, SUBSTRING( @str, 
            PATINDEX( '%[%]%', @str ), 1 ), '' ) 
SELECT @str

This table function almost works. But it turns out that it doesn't work now. The problem is that I am trying to use this function on a temp table. And the original table does not have an int PK, and I cannot add a column to the original table.

So I tried creating a view based on this table and then adding a PK int column to it. Because when I tried to create the view with this additional PK int column ("n"), it gave me an error:

Msg 156, Level 15, State 1, Line 1
Incorrect syntax near the keyword 'identity'.

But ALTER VIEW does not support adding columns. Is there another way to do this? Here is my original temp table I am trying to modify:

select [progress_note].[note_text], [progress_note].[event_log_id] 
INTO #TEMP_PN
from [evolv_cs].[dbo].[progress_note] 
group by [progress_note].[event_log_id], [progress_note].[note_text]

[note_text] is varchar(max) and event_log_id is uniqueidentifier. So [note_text] contains a bunch of "<" and ">" chars. How can I modify this dbo.ufn_StripHTML function?

I tried ur latest code and this is super fast!! However, I got the following error after it went thru 5700 rows:

Msg 537, Level 16, State 2, Line 1
Invalid length parameter passed to the LEFT or SUBSTRING function.

Do u know what this is about?

Upvotes: 1

Views: 786

Answers (1)

nathan_jr
nathan_jr

Reputation: 9282

Heres a function I wrote to remove HTML tags (pairs of <...>) using a set based approached. I am interested to see if you can repurpose to use to strip Word meta tags.

-----------------------------------------------------------
-- 1. create a number table (this is just a utility table)
-----------------------------------------------------------
set nocount on;
if object_id('dbo.Number') is not null
begin
    drop table dbo.Number;
end
go

create table dbo.Number (n int identity(1,1) primary key);

insert dbo.Number default values ;
while scope_identity() < 500
    insert dbo.Number default values ;

----------------------------------------------------------- 
-- 2. create the function (leverages the utility table)
-----------------------------------------------------------
if object_id('dbo.ufn_StripHTML') is not null
begin
    drop function dbo.ufn_StripHTML;
end
go
create function dbo.ufn_StripHTML
    (   @Input      varchar(8000),
        @Delimiter  char(1)
    )
returns varchar(8000)
as
begin

    declare @Output varchar(8000)
    select  @Input = replace(replace(@input, '<', @Delimiter), '>', @Delimiter)

    select @Output = isnull(@Output, '') + s
    from    (    select   row_number() over (order by n.n asc) [i],
                 substring(@Delimiter + @Input + @Delimiter, n.n + 1, charindex(@Delimiter, @Delimiter + @Input + @Delimiter, n.n + 1) - n.n - 1) [s]
            from    dbo.Number n
            where   n.n = charindex(@Delimiter, @Delimiter + @Input + @Delimiter, n.n) and
                 n.n <= len(@Delimiter + @Input)
            ) d
    where i % 2 = 1

    return @Output

end
go

-----------------------------------------------------------
--3. Example of calling the function when you query
-----------------------------------------------------------
if object_id('tempdb..TEMP_PN') is not null
    drop table #TEMP_PN;

create table #TEMP_PN (note_text varchar(max), event_log_id int);

insert into #TEMP_PN
    select '<b>Some very large bolded text here!</b>', 1 union all
    select 'no tags here', 2 union all
    select '<html><body><h1>My First Heading</h1><p>My first paragraph.</p></body></html>', 3

select  [Strip] = dbo.ufn_StripHTML(note_text, '|'), 
        [Orig] = note_text,
        event_log_id
from    #TEMP_PN

Edit: ported scalar to table

alter function dbo.ufn_StripHTMLTable
    (   @Input      varchar(8000),
        @Delimiter  char(1)
    )
returns @ret table (OutString varchar(8000))
as
begin
    declare @Output varchar(8000)
    select  @Input = replace(replace(@input, '<', @Delimiter), '>', @Delimiter)

    select @Output = isnull(@Output, '') + s
    from    (   select   row_number() over (order by n.n asc) [i],
                        substring(@Delimiter + @Input + @Delimiter, n.n + 1, charindex(@Delimiter, @Delimiter + @Input + @Delimiter, n.n + 1) - n.n - 1) [s]
                from    dbo.Number n
                where   n.n = charindex(@Delimiter, @Delimiter + @Input + @Delimiter, n.n) and
                        n.n <= len(@Delimiter + @Input)
            ) d
    where i % 2 = 1;

    insert into @ret
        values(@Output);

    return;
end

Upvotes: 1

Related Questions