zig
zig

Reputation: 4624

How to replace a word which has a specific pattern in text?

I have a ntext (or nvarchar(max)) column which contains HTML text for example:

<P>Hello <IMG title="" border=0 alt="" src="files/1.JPG"></P>
<P>&nbsp;</P>
<P><IMG title="" border=0 alt="" src="files/folder1/2.JPG"></P>
<P>&nbsp;</P> World!
<P><IMG title="" border=0 alt="" src="files/folder2/files/3.JPG"></P>
<P>&nbsp;</P>
<P><IMG title="" border=0 alt="" src="files/4.JPG"></P>

Now I need to replace each path of the src from files/ to files/new/ (or to other path) but only those with path of files/.

a REPLACE won't work (I think) because it would change also files/folder1/ and files/folder2/.

In the above example, I need to only change the path for 1.JPG and 4.JPG.

How can I do it? (The question is specific to sql-server).

Note: The file names/content can vary. the above HTML is just an example.

Upvotes: 1

Views: 244

Answers (3)

zig
zig

Reputation: 4624

My solution:

DECLARE @input nvarchar(max) = 
'<P>Hello <IMG title="" border=0 alt="" src="files/zzz/1.JPG"></P>
<P>&nbsp;</P>
<P><IMG title="" border=0 alt="" src="files/2.PNG"></P>
<P>&nbsp;</P> World!
<P><IMG title="" border=0 alt="" src="files/folder2/files/3.JEPG"></P>
<P>&nbsp;</P>
<P><IMG title="" border=0 src="files/4.PNG" alt=""></P>'


DECLARE @path_old nvarchar(max) = 'files/';
DECLARE @path_new nvarchar(max) = 'files/new/';

DECLARE @i int, @j int;
DECLARE @path_len int = LEN(@path_old);
DECLARE @input_len int = LEN(@input);
DECLARE @start_location int = 1;
DECLARE @output nvarchar(max) = '';
DECLARE @p nvarchar(max);

WHILE 1 = 1 
BEGIN
    SET @i = CHARINDEX('"' + @path_old, @input, @start_location) 
    IF @i <> 0 
    BEGIN
        SET @j = CHARINDEX('"', @input, @i + 1);
        SET @p = SUBSTRING(@input, @i + @path_len + 1, @j - @i - @path_len);
        IF CHARINDEX('/', @p) = 0  
            SET @output = @output + SUBSTRING(@input, @start_location - 1, @i + 2 - @start_location) + @path_new + @p 
        ELSE
            SET @output = @output + SUBSTRING(@input, @start_location - 1, @j + 2 - @start_location);           

        SET @start_location = @j + 2    
        IF @start_location >= @input_len BREAK;
    END
    ELSE
    BEGIN
        SET @output = @output + SUBSTRING(@input, @start_location - 1, @input_len - @start_location + 2);
        BREAK;  
    END
END

PRINT @output

Upvotes: 1

Tyron78
Tyron78

Reputation: 4187

Following a quick approach in order to replace a certain pattern within a nvarchar(max) string. To be more precisely: the code takes a nvarchar(max) variable @x as source and writes the transformed string into a new variable @y. However, I wrote only the part of the string replacement - the "Update" still has to be coded.

DECLARE @x nvarchar(max) = '<P>Hello <IMG title="" border=0 alt="" src="files/1.JPG"></P>
<P>&nbsp;</P>
<P><IMG title="" border=0 alt="" src="files/folder1/2.JPG"></P>
<P>&nbsp;</P> World!
<P><IMG title="" border=0 alt="" src="files/folder2/files/3.JPG"></P>
<P>&nbsp;</P>
<P><IMG title="" border=0 alt="" src="files/4.JPG"></P>'

DECLARE @brPos int = (SELECT CHARINDEX('</P>', @x));
DECLARE @brPosPrev int = 0;

DECLARE @srcPos int;
DECLARE @SlashPos int;
DECLARE @JPGPos int;

DECLARE @y nvarchar(max) = '';

DECLARE @xPart nvarchar(max);

WHILE (@brPos != 0)
BEGIN
  SET @xPart = SUBSTRING(@x, CASE WHEN @brPosPrev = 0 THEN @brPosPrev ELSE @brPosPrev END, @brPos-CASE WHEN @brPosPrev = 0 THEN @brPosPrev ELSE @brPosPrev END+4)

  SET @srcPos = (SELECT CHARINDEX('src="files/', @xPart))+11;
  SET @JPGPos = (SELECT CHARINDEX('.JPG', @xPart));
  SET @SlashPos = (SELECT CHARINDEX('/', @xPart, @srcPos));

  IF (@JPGPos < @SlashPos OR @SlashPos = 0)
    SET @xPart = REPLACE(@xPart, 'src="files/', 'src="files/new/');

  SET @y = @y + @xPart;

  SET @brPosPrev = @brPos + 4;
  SET @brPos = (SELECT CHARINDEX('</P>', @x, @brPosPrev));

END

SELECT @y

The Patterns I used in the CHARINDEX calls can be moved to variables as well - so you are not bound to use .JPG or whatever.

Upvotes: 2

Serge
Serge

Reputation: 4036

Method 1 - 1 or 2 known file names

If you only need to replace paths for 1.JPG and 4.JPG, then you can write an UPDATE statement:

UPDATE t
    SET html = REPLACE(REPLACE(html, 'src="files/4.JPG"', 'src="files/new/4.JPG"'), 'src="files/1.JPG"', 'src="files/new/1.JPG"')
FROM t
WHERE CHARINDEX('src="files/1.JPG"', html) + CHARINDEX('src="files/4.JPG"', html) > 0;

Or, if you'd rather keep them separate:

UPDATE t
    SET html = REPLACE(html, 'src="files/1.JPG"', 'src="files/new/1.JPG"')
FROM t
WHERE CHARINDEX('src="files/1.JPG"', html) > 0;


UPDATE t
    SET html = REPLACE(html, 'src="files/4.JPG"', 'src="files/new/4.JPG"')
FROM t
WHERE CHARINDEX('src="files/4.JPG"', html) > 0;

Method 2 - potentially large list of files, whose names can be determined from the filesystem

If you need to replace the paths for a large number of files, whose file names can be determined from the filesystem, try this instead:

  1. Enable xp_cmdshell if required (see Enable 'xp_cmdshell' SQL Server).

  2. Create a temporary table to store the file names:

    CREATE TABLE #Images(
        Image_Name VARCHAR(100) NULL
    )
    
  3. List all the files.

    INSERT INTO #Images
    exec xp_cmdshell 'dir D:\Data\*.jpg /b'
    
  4. Remove NULL lines and anything else that's not required.

    DELETE FROM #Images
    WHERE Image_Name IS NULL
    
  5. Generate SQL statements for each file name.

    SELECT '
        UPDATE t
            SET html = REPLACE(html, ''src="files/' + Image_Name + '"'', ''src="files/new/' + Image_Name + '"'')
        FROM t
        WHERE CHARINDEX(''src="files/' + Image_Name + '"'', html) > 0;
    '
    FROM #Images
    
  6. Now simply copy-paste the output from executing the above SELECT statement back and execute the UPDATE queries.

Upvotes: 0

Related Questions