Reputation: 3494

Replacing unwanted portions of a string value in SQL column

I have a table in the following format where COL1 contains a unique identifier and COL2 contains a collection of phone numbers followed by a tag (<abc> or <def>) and delimited by pipe (|). Number of phone records in each row is unknown - it may contain just one phone number followed by tag or upto 10.

Table
----------
COL1 : COL2
----------
ID1 : 1234567890<abc>|4312314124<abc>|1232345133<def>|4131234131<abc>|41234134132<def>

I need to copy this data into a new table with the result in following format i.e. remove all portion of the string with the tag <def>.

    Table
    ----------
    COL1 : COL2
    ----------
    ID1 : 1234567890<abc>,4312314124<abc>,4131234131<abc>

What are the best ways to do this to get optimum performance? I need the program to transform data in a table that contains about a million records.

Upvotes: 0

Answers (3)

Amirhossein Yari

Reputation: 2326

I didn't understand your question at first.but for answer you can you following code if you sql server is 2016 or upper.I think it has a good performance

Insert into table2 (ID1)
SELECT 
    STUFF((SELECT [value] +N',' AS 'data()' FROM STRING_SPLIT(ID1,'|') WHERE [value] LIKE'%<abc>' FOR XML PATH(''),TYPE)
    .value('text()[1]','nvarchar(max)'),1,2,N'') AS ID1 
FROM    
    table1

Upvotes: -1

Alan Burstein

Reputation: 7928

If performance is important then I would suggest delimitedSplit8k_Lead. You can just use the pipe as a delimiter to split the string then exclude items (tokens) that don't end with .

DECLARE @table TABLE (COL1 VARCHAR(10), COL2 VARCHAR(1000));
INSERT @table
VALUES
('ID1','1234567890<abc>|4312314124<abc>|1232345133<def>|4131234131<abc>|41234134132<def>'),
('ID2','2662314129<abc>|7868845133<abc>|6831234131<abc>|41234139999<xxx>|1234567999<abc>')

SELECT t.COL1, ds.item
FROM @table t
CROSS APPLY dbo.DelimitedSplit8K_LEAD(t.COL2,'|') ds
WHERE ds.Item LIKE '%<abc>';

Returns

COL1       item
---------- -----------------
ID1        1234567890<abc>
ID1        4312314124<abc>
ID1        4131234131<abc>
ID2        2662314129<abc>
ID2        7868845133<abc>
ID2        6831234131<abc>
ID2        1234567999<abc>

Then you use XML PATH for concatenation like this:

DECLARE @table TABLE (COL1 VARCHAR(10), COL2 VARCHAR(1000));
INSERT @table
VALUES
('ID1','1234567890<abc>|4312314124<abc>|1232345133<def>|4131234131<abc>|41234134132<def>'),
('ID2','2662314129<abc>|7868845133<abc>|6831234131<abc>|41234139999<xxx>|1234567999<abc>')

SELECT t.COL1, stripBadNumbers.newString
FROM @table t
CROSS APPLY 
(VALUES((
  SELECT ds.item
  FROM dbo.DelimitedSplit8K_LEAD(t.COL2,'|') ds
  WHERE ds.Item LIKE '%<abc>'
  FOR XML PATH(''), TYPE
).value('.', 'varchar(1000)'))) stripBadNumbers(newString);

Returns:

COL1       newString
---------- -------------------------------------------------------------------
ID1        1234567890<abc>4312314124<abc>4131234131<abc>
ID2        2662314129<abc>7868845133<abc>6831234131<abc>1234567999<abc>

Upvotes: 1

sticky bit

Reputation: 37487

That string of yours can easily be transformed into some XML basically using replace(). The phone numbers with the right tag can then be selected using XQuery. As a bonus this might work with an arbitrary number of phone numbers.

(I don't get your schema, so I use my own. Translate it into yours yourself.)

CREATE TABLE elbat
             (nmuloc nvarchar(MAX));

INSERT INTO elbat
            (nmuloc)
            VALUES ('1234567890<abc>|4312314124<abc>|1232345133<def>|4131234131<abc>|41234134132<def>');

WITH
cte AS
(
SELECT convert(xml,
               concat('<phonenumbers><phonenumber number="', 
               replace(replace(substring(nmuloc,
                                         1,
                                         len(nmuloc) - 1),
                               '<',
                               '" tag="'),
                       '>|',
                       '"/><phonenumber number="'),
               '"/></phonenumbers>')) phonenumbers
       FROM elbat
)
SELECT stuff((SELECT ',' + nodes.node.value('concat(./@number, "<", ./@tag, ">")',
                                            'nvarchar(max)')
                     FROM cte
                          CROSS APPLY phonenumbers.nodes('/phonenumbers/phonenumber[@tag="abc"]') nodes(node)
                     FOR XML PATH(''),
                             TYPE).value('(.)[1]',
                                         'nvarchar(max)'),
             1,
             1,
             '');

But while you're at it you should really consider to normalize your schema and don't use delimiter separated lists and the also non atomic number and tag combination in a string anymore!

SQL Fiddle

Upvotes: 1

Replacing unwanted portions of a string value in SQL column

Answers (3)

Related Questions