CarlOnFire
CarlOnFire

Reputation: 23

Replace part of of a string that contains any results from a column in a table

I'm attempting to replace part of a string in a table with specific text if the string contains the exact string from a column in a table.

I've created a list of just the strings i want to modifying into a variable table, then connecting to the target table with a CROSS JOIN, however the amount of data within the two tables means a cross join could end up coming to hundreds of millions of rows, and the vast majority of the strings i want to swap the data out with will likely not contain any of the data from the second table.

There also isn't any method of identifying which records in table 1 will get modified, as there are no identifiers within the string i want to modify. what I have already is below:

DECLARE @Table2 TABLE (T2.Column1 Varchar(100), T2.Column2 INT)


INSERT INTO @Table2
Select
    T3.Column2 + ', ' + T3.Column3,
    T3.Column1
FROM Table3 T3
    WHERE T3.Column1 IN (1,2,3,4)

UPDATE Table1 SET Column1 = REPLACE(T1.Column1, T2.Column1, 'String')
        FROM Table1 T1
        CROSS JOIN @Table2 T2

I've also attempted the same thing using a CROSS APPLY instead of a CROSS JOIN, but the performance remains the same.

This does work, however the performance is poor due to having to join every row on both tables. Is there any method where i can achieve the same result but without joining every row to compare?

EDIT:

sample data:

CREATE TABLE Table1 ([Column1] varchar(4000), [Column2] INT)
CREATE TABLE Table3 ([Column1] INT, [Column2] Varchar(50), [Column3] Varchar(50))

INSERT [dbo].[Table1] ([Column1], [Column2]) VALUES ('Example Data To Replace 1: text, example', 1)
INSERT [dbo].[Table1] ([Column1], [Column2]) VALUES ('Example Data To Replace 2 example text', 2)
INSERT [dbo].[Table1] ([Column1], [Column2]) VALUES ('Example Data To Replace 3', 3)
INSERT [dbo].[Table1] ([Column1], [Column2]) VALUES ('Example Data To, Replace 4 extra text', 4)
INSERT [dbo].[Table1] ([Column1], [Column2]) VALUES ('Example Data To, Replace 5', 5)
INSERT [dbo].[Table1] ([Column1], [Column2]) VALUES ('Example Data To Replace 6', 6)
INSERT [dbo].[Table1] ([Column1], [Column2]) VALUES ('Example, Data To Replace 7', 7)

INSERT [dbo].[Table3] ([Column1], [Column2], [Column3]) VALUES (1, 'text', 'example')
INSERT [dbo].[Table3] ([Column1], [Column2], [Column3]) VALUES (2, 'To', 'Replace')
INSERT [dbo].[Table3] ([Column1], [Column2], [Column3]) VALUES (3, 'Example', 'Data')

I would have expected it to return the following, but its only actually corrected the first line in table1:

Column1                                     Column2
Example Data To Replace 1: String           1
Example Data To Replace 2 example text      2
Example Data To Replace 3                   3
Example Data String 4 extra text            4
Example Data String 5                       5
Example Data To Replace 6                   6
String To Replace 7                         7

Upvotes: 0

Views: 652

Answers (1)

Zohar Peled
Zohar Peled

Reputation: 82474

I don't understand why you use cross join when you can use inner join instead:

UPDATE T1 
SET T1.Column1 = REPLACE(T1.Column1, T2.Column1, 'String')
FROM Table1 As T1 
JOIN Table2 As T2 ON T1.Column1 LIKE '%'+ T2.Column1 +'%'

This will only update the records where there is actually something to update, and should have much better performance then a cross join. If it still suffers a performance issue, you might want to use full text search instead of like.

You can see a live demo on rextester.

Upvotes: 1

Related Questions