DenStudent
DenStudent

Reputation: 928

SQL Server - remove character unless preceding or followed by specific character

Lets say I have a dataset like this:

Data
I have 23, chickens, but no cats
I have 23, chickensx, but no cats
I have 23, chickens,x but no cats

I want to remove every comma. Unless it is followed or preceding an x. So in this case it should become:

Data
I have 23 chickens but no cats
I have 23 chickensx, but no cats
I have 23 chickens,x but no cats

Any ideas/suggestions on how to do this? I can have multiple comma's in one record and multiple that are followed or preceding an x.

Upvotes: 0

Views: 80

Answers (4)

Chris F Carroll
Chris F Carroll

Reputation: 12360

It depends.

As a one-off task – 'I have a load of data I need to manipulate' – you can do it by a sequence of

  • put-in-escape-codes
  • remove commas
  • unescape escape codes.

You can eyeball your data to make sure you choose escape codes not in the data set. e.g.:

CREATE TABLE T (DATA VARCHAR(100));

INSERT INTO T(DATA) VALUES 
('I have 23 , chickens, ,x xw, but no cats'),
('I have 23 , chickens, but no ,@, cats'),
('I have 23, chickens, but no cats'),
('I have 23, chickensx, but no cats'),
(', 23!, I have 23, chickens,x, but no cats'),
(' , I have 23, chickens,x but no cats ,x _, _'),
('x,abc , !, x,x,'),
('I have 23 , chickens, but no cats'),
('I have 23!, chickens, but no cats'),
('chickens,x, and ,x,'),
(',x,x,')


Select
  Replace(
  Replace(
  Replace(
  Replace(
  Replace(
  Replace(
  Replace(Data,
        ',x,', '___[]___'),
        ',x' , '___[___'),
        'x,' , '___]___'),

         ',' , ''),

        '___]___' , 'x,'),
        '___[___' , ',x'),
        '___[]___' , ',x,')

from T

But to do this as a reliable repeatable task on data you don't personally inspect, then @Jeroen-Mostert's point about the CLR or ETL is the better way to go.

You could still do it with crude Replace(Replace(.... I think. It would have to be

  • Choose some escape characters
  • Escape each escape characters (e.g. by doubling each _ [ ]
  • Escape each sequence that matters ,x, x, ,x
  • Remove commas
  • Unescape sequences
  • Unescape escape characters.

I think that would be a total of 13 nested Replaces.

Upvotes: 1

Ilyes
Ilyes

Reputation: 14928

I don't think this kind of work should be in the database but

CREATE TABLE T (DATA VARCHAR(100));

INSERT INTO T(DATA) VALUES 
('I have 23 , chickens, ,x xw, but no cats'),
('I have 23 , chickens, but no ,@, cats'),
('I have 23, chickens, but no cats'),
('I have 23, chickensx, but no cats'),
(', 23!, I have 23, chickens,x, but no cats'),
(' , I have 23, chickens,x but no cats ,x _, _'),
('x,abc , !, x,x,');

DECLARE @DATA VARCHAR(MAX) = '';

SELECT @DATA = STRING_AGG(DATA, CHAR(9))
FROM T;

WHILE (SELECT PATINDEX('%[^x,%],[^x,%]%', @DATA)) > 0
BEGIN
  SET @DATA = STUFF(@DATA, PATINDEX('%[^x,%],[^x,%]%', @DATA) + 1, 1, '');
END

SELECT *
FROM STRING_SPLIT(@DATA, CHAR(9));

Returns:

+-------------------------------------------+
|                   value                   |
+-------------------------------------------+
| I have 23  chickens ,x xw but no cats     |
| I have 23  chickens but no @ cats         |
| I have 23 chickens but no cats            |
| I have 23 chickensx, but no cats          |
|  23! I have 23 chickens,x, but no cats    |
|   I have 23 chickens,x but no cats ,x _ _ |
| x,abc  ! x,x,                             |
+-------------------------------------------+

Demo

Or by calling REPLACE() function 7 times, with CHAR() as

SELECT Data,
       REPLACE(
               REPLACE(
                       REPLACE(
                               REPLACE(
                                       REPLACE(
                                               REPLACE(
                                                       REPLACE(Data, ',x,', CHAR(1)),
                                                       ',x', CHAR(2)
                                                      ),
                                               'x,', CHAR(9)),
                                       ',', ''),
                               CHAR(1), ',x,'),
                       CHAR(2), ',x'),
               CHAR(9), 'x,') Results
FROM T;

Order is from inside to outside (from the inner replace to the top replace)

  • Find ',x,' and replace it with CHAR(1).
  • Find ',x' and relace it with CHAR(2).
  • Find 'x,' and replace it with CHAR(9).
  • Find all ',' and replace them with ''.
  • Get back ',x,'.
  • Get back ',x'.
  • Get back 'x,'.

Demo

Upvotes: 2

Lukasz Szozda
Lukasz Szozda

Reputation: 175606

You could use multiple REPLACE:

SELECT col,
 REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
   col,',x,','#'),',x','~'),'x,','^'),',',''),'~',',x'),'^','x,'),'#',',x,')
FROM tab

db<>fiddle demo

Upvotes: 2

Thom A
Thom A

Reputation: 95554

It's not particularly pretty, but you could use REPLACE to change the character(s) of 'x,' and 'x,' to something else, replace all the commas and then change the other characters back:

WITH VTE AS(
    SELECT String
    FROM (VALUES ('I have 23, chickens, but no cats'),
                 ('I have 23, chickensx, but no cats'),
                 ('I have 23, chickens,x but no cats'))V(String))
SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(String,'x,',CHAR(1)),',x',CHAR(2)),',',''),CHAR(2),',x'),CHAR(1),'x,') AS NewString
FROM VTE;

An important thing to note is to choose a character that doesn't appear in your string (hence my choice of CHAR(1) and CHAR(2), as they aren't "typeable" characters).

Upvotes: 1

Related Questions