Reputation: 3511
I am using SSMS 2008 and I have the following Scalar function to take a text string and remove all metatags from Microsoft Word. The tags are enclosed in "<...>" and there can be any number of tags / record in one column.
I created a scalar function based off this code below to update each row in this column. But this scalar function takes a super long time to complete. Would a table function version of this be faster? If so, how could I rewrite this function to make it table?
WHILE PATINDEX( '%[%]%', @str ) > 0
SET @str = REPLACE( @str, SUBSTRING( @str,
PATINDEX( '%[%]%', @str ), 1 ), '' )
SELECT @str
This table function almost works. But it turns out that it doesn't work now. The problem is that I am trying to use this function on a temp table. And the original table does not have an int PK, and I cannot add a column to the original table.
So I tried creating a view based on this table and then adding a PK int column to it. Because when I tried to create the view with this additional PK int column ("n"), it gave me an error:
Msg 156, Level 15, State 1, Line 1
Incorrect syntax near the keyword 'identity'.
But ALTER VIEW does not support adding columns. Is there another way to do this? Here is my original temp table I am trying to modify:
select [progress_note].[note_text], [progress_note].[event_log_id]
INTO #TEMP_PN
from [evolv_cs].[dbo].[progress_note]
group by [progress_note].[event_log_id], [progress_note].[note_text]
[note_text] is varchar(max) and event_log_id is uniqueidentifier. So [note_text] contains a bunch of "<" and ">" chars. How can I modify this dbo.ufn_StripHTML function?
I tried ur latest code and this is super fast!! However, I got the following error after it went thru 5700 rows:
Msg 537, Level 16, State 2, Line 1
Invalid length parameter passed to the LEFT or SUBSTRING function.
Do u know what this is about?
Upvotes: 1
Views: 786
Reputation: 9282
Heres a function I wrote to remove HTML tags (pairs of <...>) using a set based approached. I am interested to see if you can repurpose to use to strip Word meta tags.
-----------------------------------------------------------
-- 1. create a number table (this is just a utility table)
-----------------------------------------------------------
set nocount on;
if object_id('dbo.Number') is not null
begin
drop table dbo.Number;
end
go
create table dbo.Number (n int identity(1,1) primary key);
insert dbo.Number default values ;
while scope_identity() < 500
insert dbo.Number default values ;
-----------------------------------------------------------
-- 2. create the function (leverages the utility table)
-----------------------------------------------------------
if object_id('dbo.ufn_StripHTML') is not null
begin
drop function dbo.ufn_StripHTML;
end
go
create function dbo.ufn_StripHTML
( @Input varchar(8000),
@Delimiter char(1)
)
returns varchar(8000)
as
begin
declare @Output varchar(8000)
select @Input = replace(replace(@input, '<', @Delimiter), '>', @Delimiter)
select @Output = isnull(@Output, '') + s
from ( select row_number() over (order by n.n asc) [i],
substring(@Delimiter + @Input + @Delimiter, n.n + 1, charindex(@Delimiter, @Delimiter + @Input + @Delimiter, n.n + 1) - n.n - 1) [s]
from dbo.Number n
where n.n = charindex(@Delimiter, @Delimiter + @Input + @Delimiter, n.n) and
n.n <= len(@Delimiter + @Input)
) d
where i % 2 = 1
return @Output
end
go
-----------------------------------------------------------
--3. Example of calling the function when you query
-----------------------------------------------------------
if object_id('tempdb..TEMP_PN') is not null
drop table #TEMP_PN;
create table #TEMP_PN (note_text varchar(max), event_log_id int);
insert into #TEMP_PN
select '<b>Some very large bolded text here!</b>', 1 union all
select 'no tags here', 2 union all
select '<html><body><h1>My First Heading</h1><p>My first paragraph.</p></body></html>', 3
select [Strip] = dbo.ufn_StripHTML(note_text, '|'),
[Orig] = note_text,
event_log_id
from #TEMP_PN
Edit: ported scalar to table
alter function dbo.ufn_StripHTMLTable
( @Input varchar(8000),
@Delimiter char(1)
)
returns @ret table (OutString varchar(8000))
as
begin
declare @Output varchar(8000)
select @Input = replace(replace(@input, '<', @Delimiter), '>', @Delimiter)
select @Output = isnull(@Output, '') + s
from ( select row_number() over (order by n.n asc) [i],
substring(@Delimiter + @Input + @Delimiter, n.n + 1, charindex(@Delimiter, @Delimiter + @Input + @Delimiter, n.n + 1) - n.n - 1) [s]
from dbo.Number n
where n.n = charindex(@Delimiter, @Delimiter + @Input + @Delimiter, n.n) and
n.n <= len(@Delimiter + @Input)
) d
where i % 2 = 1;
insert into @ret
values(@Output);
return;
end
Upvotes: 1