Reputation: 928
Lets say I have a dataset like this:
Data
I have 23, chickens, but no cats
I have 23, chickensx, but no cats
I have 23, chickens,x but no cats
I want to remove every comma. Unless it is followed or preceding an x. So in this case it should become:
Data
I have 23 chickens but no cats
I have 23 chickensx, but no cats
I have 23 chickens,x but no cats
Any ideas/suggestions on how to do this? I can have multiple comma's in one record and multiple that are followed or preceding an x.
Upvotes: 0
Views: 80
Reputation: 12360
It depends.
As a one-off task – 'I have a load of data I need to manipulate' – you can do it by a sequence of
You can eyeball your data to make sure you choose escape codes not in the data set. e.g.:
CREATE TABLE T (DATA VARCHAR(100));
INSERT INTO T(DATA) VALUES
('I have 23 , chickens, ,x xw, but no cats'),
('I have 23 , chickens, but no ,@, cats'),
('I have 23, chickens, but no cats'),
('I have 23, chickensx, but no cats'),
(', 23!, I have 23, chickens,x, but no cats'),
(' , I have 23, chickens,x but no cats ,x _, _'),
('x,abc , !, x,x,'),
('I have 23 , chickens, but no cats'),
('I have 23!, chickens, but no cats'),
('chickens,x, and ,x,'),
(',x,x,')
Select
Replace(
Replace(
Replace(
Replace(
Replace(
Replace(
Replace(Data,
',x,', '___[]___'),
',x' , '___[___'),
'x,' , '___]___'),
',' , ''),
'___]___' , 'x,'),
'___[___' , ',x'),
'___[]___' , ',x,')
from T
But to do this as a reliable repeatable task on data you don't personally inspect, then @Jeroen-Mostert's point about the CLR or ETL is the better way to go.
You could still do it with crude Replace(Replace(...
. I think. It would have to be
_
[
]
,x,
x,
,x
I think that would be a total of 13 nested Replaces
.
Upvotes: 1
Reputation: 14928
I don't think this kind of work should be in the database but
CREATE TABLE T (DATA VARCHAR(100));
INSERT INTO T(DATA) VALUES
('I have 23 , chickens, ,x xw, but no cats'),
('I have 23 , chickens, but no ,@, cats'),
('I have 23, chickens, but no cats'),
('I have 23, chickensx, but no cats'),
(', 23!, I have 23, chickens,x, but no cats'),
(' , I have 23, chickens,x but no cats ,x _, _'),
('x,abc , !, x,x,');
DECLARE @DATA VARCHAR(MAX) = '';
SELECT @DATA = STRING_AGG(DATA, CHAR(9))
FROM T;
WHILE (SELECT PATINDEX('%[^x,%],[^x,%]%', @DATA)) > 0
BEGIN
SET @DATA = STUFF(@DATA, PATINDEX('%[^x,%],[^x,%]%', @DATA) + 1, 1, '');
END
SELECT *
FROM STRING_SPLIT(@DATA, CHAR(9));
Returns:
+-------------------------------------------+
| value |
+-------------------------------------------+
| I have 23 chickens ,x xw but no cats |
| I have 23 chickens but no @ cats |
| I have 23 chickens but no cats |
| I have 23 chickensx, but no cats |
| 23! I have 23 chickens,x, but no cats |
| I have 23 chickens,x but no cats ,x _ _ |
| x,abc ! x,x, |
+-------------------------------------------+
Or by calling REPLACE()
function 7 times, with CHAR()
as
SELECT Data,
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(Data, ',x,', CHAR(1)),
',x', CHAR(2)
),
'x,', CHAR(9)),
',', ''),
CHAR(1), ',x,'),
CHAR(2), ',x'),
CHAR(9), 'x,') Results
FROM T;
Order is from inside to outside (from the inner replace to the top replace)
',x,'
and replace it with CHAR(1)
.',x'
and relace it with CHAR(2)
.'x,'
and replace it with CHAR(9)
.','
and replace them with ''
.',x,'
.',x'
.'x,'
.Upvotes: 2
Reputation: 175606
You could use multiple REPLACE
:
SELECT col,
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
col,',x,','#'),',x','~'),'x,','^'),',',''),'~',',x'),'^','x,'),'#',',x,')
FROM tab
Upvotes: 2
Reputation: 95554
It's not particularly pretty, but you could use REPLACE
to change the character(s) of 'x,'
and 'x,'
to something else, replace all the commas and then change the other characters back:
WITH VTE AS(
SELECT String
FROM (VALUES ('I have 23, chickens, but no cats'),
('I have 23, chickensx, but no cats'),
('I have 23, chickens,x but no cats'))V(String))
SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(String,'x,',CHAR(1)),',x',CHAR(2)),',',''),CHAR(2),',x'),CHAR(1),'x,') AS NewString
FROM VTE;
An important thing to note is to choose a character that doesn't appear in your string (hence my choice of CHAR(1)
and CHAR(2)
, as they aren't "typeable" characters).
Upvotes: 1