Reputation: 309
I have a column in my table which looks like this:
Column 1
-------------------------
Space Planning April 2019
Space Décor Planning May 2020
Hidden September 2018
How do I query to remove the month and year words from the column to show me a result like
Column 1
------------------------
Space Planning
Space Décor Planning
Hidden
In general, how to dynamically filter out words from a string in a column using SQL.
Upvotes: 0
Views: 2651
Reputation: 1362
I think this will work:
declare @table table (id int primary key identity,data varchar(100))
insert into @table values
('Space Planning April 2019 ')
, ('Space Décor Planning May 2020 ')
, ('Hidden September 2018');
select
data OriginalData
, replace(data,right(rtrim(data),5),'') ReplacedYear
, case when replace(data,right(rtrim(data),5),'') like '%Jan%' then replace(replace(data,right(rtrim(data),5),''),' January','')
when replace(data,right(rtrim(data),5),'') like '%Feb%' then replace(replace(data,right(rtrim(data),5),''),' February','')
when replace(data,right(rtrim(data),5),'') like '%Mar%' then replace(replace(data,right(rtrim(data),5),''),' March','')
when replace(data,right(rtrim(data),5),'') like '%Apr%' then replace(replace(data,right(rtrim(data),5),''),' April','')
when replace(data,right(rtrim(data),5),'') like '%May%' then replace(replace(data,right(rtrim(data),5),''),' May','')
when replace(data,right(rtrim(data),5),'') like '%Jun%' then replace(replace(data,right(rtrim(data),5),''),' June','')
when replace(data,right(rtrim(data),5),'') like '%Jul%' then replace(replace(data,right(rtrim(data),5),''),' July','')
when replace(data,right(rtrim(data),5),'') like '%Aug%' then replace(replace(data,right(rtrim(data),5),''),' August','')
when replace(data,right(rtrim(data),5),'') like '%Sep%' then replace(replace(data,right(rtrim(data),5),''),' September','')
when replace(data,right(rtrim(data),5),'') like '%Oct%' then replace(replace(data,right(rtrim(data),5),''),' October','')
when replace(data,right(rtrim(data),5),'') like '%Nov%' then replace(replace(data,right(rtrim(data),5),''),' November','')
when replace(data,right(rtrim(data),5),'') like '%Dec%' then replace(replace(data,right(rtrim(data),5),''),' December','')
else replace(data,right(rtrim(data),5),'') end ExpectedResults
from @table;
Results:
EDIT
An alternative solution is to run the following query which gives you the same results (using the same source data above):
with cte as (
select
id
, rank() over (order by data) rnk
, ltrim(rtrim(Value)) value
from @table
outer apply string_split(data,' ')
where ltrim(rtrim(Value)) <> ''
and ltrim(rtrim(Value)) not in ('January','February','March','April','May','June','July','August','September','October','November','December')
and value not in (select right(rtrim(data),4) from @table)
)
select
string_agg([value], ' ') within group (order by id)
from cte
group by rnk
Results:
See DEMO
Upvotes: 0
Reputation: 4454
The way to do it "in general" is to first identify some recognisable pattern that works for all of the data in your specific case. Then you write code which makes use of that pattern. Yeah, I know that sounds weird. Let me explain.
In your case, you seem to have a pattern in the form: "some kind of data", then a space, then a month name, then a space, then a 4 digit year. Is this pattern always consistent? Is there always a 4 digit year at the end? Can you be sure of that? If so then you can remove the month name and year by finding the position of the first space which is more than 5 characters from the end of the value (this will be the space between "the data" and "the month name"), then removing it and everything after it. Here is an example of how you could do that. It's not the only way, I have tried to write an algorithm that you can understand. Read it from the inside out!:
declare @somedata varchar(64) = 'Space Planning May 2019'
select left
(
@somedata,
len(@somedata) - 5 - charindex -- the length of the string excluding the data below
(
' ', -- look for the first space character
reverse -- reverse it
(
left -- get all but the last 5 characters of the value
(
@somedata,
len(@somedata) - 5
)
)
)
)
To get this result for a whole column called yourColumn
in a table which I will assume is called YourTable
, just add the from clause and replace the variable with the actual column name:
select newColumn = left
(
yourColumn,
len(yourColumn) - 5 - charindex -- the length of the string excluding the data below
(
' ', -- look for the first space character
reverse -- reverse it
(
left -- get all but the last 5 characters of the value
(
yourColumn,
len(yourColumn) - 5
)
)
)
)
from YourTable
Upvotes: 2
Reputation: 29943
Explanations:
As an additional option, but if you use SQL Server 2017+, you may try a JSON-based approach. The idea is to transform the data into a valid JSON array, parse this array with OPENJSON()
to get the words and words' positions, and after that make the appropriate changes or simply filter the data:
Table:
CREATE TABLE Data (Sentence varchar(1000))
INSERT INTO Data (Sentence)
VALUES
('Space Planning April 2019'),
('Space Décor Planning May 2020'),
('Hidden September 2018')
Update the table:
Statement:
UPDATE Data
SET Sentence = (
SELECT STRING_AGG([value], ' ') WITHIN GROUP (ORDER BY CONVERT(int, [key]))
FROM OPENJSON(CONCAT('["', REPLACE(STRING_ESCAPE(Sentence, 'json'), ' ', '","'), '"]'))
WHERE [value] NOT IN ('April', 'May', '2018', '2019', '2020')
)
SELECT *
FROM Data
Result:
Sentence
-----------------------------------------------------
Space Planning
Space Décor Planning
Hidden September
Select and filter words:
Statement:
SELECT d.Sentence, CONVERT(int, j.[key]) + 1 AS Position, j.[value] AS Word
FROM Data d
CROSS APPLY OPENJSON(CONCAT('["', REPLACE(STRING_ESCAPE(d.Sentence, 'json'), ' ', '","'), '"]')) j
WHERE j.[value] NOT IN ('April', 'May', '2018', '2019', '2020')
Result:
Sentence Position Word
--------------------------------------------------
Space Planning April 2019 1 Space
Space Planning April 2019 2 Planning
Space Décor Planning May 2020 1 Space
Space Décor Planning May 2020 2 Décor
Space Décor Planning May 2020 3 Planning
Hidden September 2018 1 Hidden
Upvotes: 0