Vinay billa
Vinay billa

Reputation: 309

How to remove set of words from a string in SQL?

I have a column in my table which looks like this:

Column 1
-------------------------
Space Planning April 2019 
Space Décor Planning May 2020 
Hidden September 2018

How do I query to remove the month and year words from the column to show me a result like

Column 1
------------------------
Space Planning  
Space Décor Planning 
Hidden

In general, how to dynamically filter out words from a string in a column using SQL.

Upvotes: 0

Views: 2651

Answers (3)

Attie Wagner
Attie Wagner

Reputation: 1362

I think this will work:

declare @table  table   (id int primary key identity,data   varchar(100))
insert into @table values
    ('Space Planning April 2019 ')
,   ('Space Décor Planning May 2020 ')
,   ('Hidden September 2018');


select
    data    OriginalData
,   replace(data,right(rtrim(data),5),'')   ReplacedYear
,   case when replace(data,right(rtrim(data),5),'') like '%Jan%' then replace(replace(data,right(rtrim(data),5),''),' January','')
            when replace(data,right(rtrim(data),5),'') like '%Feb%' then replace(replace(data,right(rtrim(data),5),''),' February','')
            when replace(data,right(rtrim(data),5),'') like '%Mar%' then replace(replace(data,right(rtrim(data),5),''),' March','')
            when replace(data,right(rtrim(data),5),'') like '%Apr%' then replace(replace(data,right(rtrim(data),5),''),' April','')
            when replace(data,right(rtrim(data),5),'') like '%May%' then replace(replace(data,right(rtrim(data),5),''),' May','')
            when replace(data,right(rtrim(data),5),'') like '%Jun%' then replace(replace(data,right(rtrim(data),5),''),' June','')
            when replace(data,right(rtrim(data),5),'') like '%Jul%' then replace(replace(data,right(rtrim(data),5),''),' July','')
            when replace(data,right(rtrim(data),5),'') like '%Aug%' then replace(replace(data,right(rtrim(data),5),''),' August','')
            when replace(data,right(rtrim(data),5),'') like '%Sep%' then replace(replace(data,right(rtrim(data),5),''),' September','')
            when replace(data,right(rtrim(data),5),'') like '%Oct%' then replace(replace(data,right(rtrim(data),5),''),' October','')
            when replace(data,right(rtrim(data),5),'') like '%Nov%' then replace(replace(data,right(rtrim(data),5),''),' November','')
            when replace(data,right(rtrim(data),5),'') like '%Dec%' then replace(replace(data,right(rtrim(data),5),''),' December','')
            else replace(data,right(rtrim(data),5),'') end  ExpectedResults
from    @table;

Results:

Expected Results

EDIT

An alternative solution is to run the following query which gives you the same results (using the same source data above):

with cte as (
select
    id
,   rank() over (order by data) rnk
,   ltrim(rtrim(Value))         value
from    @table
outer apply string_split(data,' ')
where   ltrim(rtrim(Value)) <> ''
and     ltrim(rtrim(Value)) not in ('January','February','March','April','May','June','July','August','September','October','November','December')
and     value not in (select right(rtrim(data),4) from @table)
)

select
string_agg([value], ' ') within group (order by id)
from    cte
group by rnk

Results:

Expected2

See DEMO

Upvotes: 0

allmhuran
allmhuran

Reputation: 4454

The way to do it "in general" is to first identify some recognisable pattern that works for all of the data in your specific case. Then you write code which makes use of that pattern. Yeah, I know that sounds weird. Let me explain.

In your case, you seem to have a pattern in the form: "some kind of data", then a space, then a month name, then a space, then a 4 digit year. Is this pattern always consistent? Is there always a 4 digit year at the end? Can you be sure of that? If so then you can remove the month name and year by finding the position of the first space which is more than 5 characters from the end of the value (this will be the space between "the data" and "the month name"), then removing it and everything after it. Here is an example of how you could do that. It's not the only way, I have tried to write an algorithm that you can understand. Read it from the inside out!:

declare @somedata varchar(64) = 'Space Planning May 2019'

select  left
        (
            @somedata,
            len(@somedata) - 5 - charindex -- the length of the string excluding the data below
            (
                ' ', -- look for the first space character
                reverse -- reverse it
                (
                    left -- get all but the last 5 characters of the value
                    (
                        @somedata,
                        len(@somedata) - 5
                    )
                )
            )
        )

To get this result for a whole column called yourColumn in a table which I will assume is called YourTable, just add the from clause and replace the variable with the actual column name:

select  newColumn = left
        (
            yourColumn,
            len(yourColumn) - 5 - charindex -- the length of the string excluding the data below
            (
                ' ', -- look for the first space character
                reverse -- reverse it
                (
                    left -- get all but the last 5 characters of the value
                    (
                        yourColumn,
                        len(yourColumn) - 5
                    )
                )
            )
        )
from    YourTable

Upvotes: 2

Zhorov
Zhorov

Reputation: 29943

Explanations:

As an additional option, but if you use SQL Server 2017+, you may try a JSON-based approach. The idea is to transform the data into a valid JSON array, parse this array with OPENJSON() to get the words and words' positions, and after that make the appropriate changes or simply filter the data:

Table:

CREATE TABLE Data (Sentence varchar(1000))
INSERT INTO Data (Sentence) 
VALUES
   ('Space Planning April 2019'),
   ('Space Décor Planning May 2020'), 
   ('Hidden September 2018')

Update the table:

Statement:

UPDATE Data
SET Sentence = (
   SELECT STRING_AGG([value], ' ') WITHIN GROUP (ORDER BY CONVERT(int, [key]))
   FROM OPENJSON(CONCAT('["', REPLACE(STRING_ESCAPE(Sentence, 'json'), ' ', '","'), '"]'))
   WHERE [value] NOT IN ('April', 'May', '2018', '2019', '2020')
)
SELECT *
FROM Data

Result:

Sentence
-----------------------------------------------------
Space Planning
Space Décor Planning
Hidden September

Select and filter words:

Statement:

SELECT d.Sentence, CONVERT(int, j.[key]) + 1 AS Position, j.[value] AS Word
FROM Data d
CROSS APPLY OPENJSON(CONCAT('["', REPLACE(STRING_ESCAPE(d.Sentence, 'json'), ' ', '","'), '"]')) j
WHERE j.[value] NOT IN ('April', 'May', '2018', '2019', '2020') 

Result:

Sentence                        Position Word
--------------------------------------------------
Space Planning April 2019       1        Space
Space Planning April 2019       2        Planning
Space Décor Planning May 2020   1        Space
Space Décor Planning May 2020   2        Décor
Space Décor Planning May 2020   3        Planning
Hidden September 2018           1        Hidden

Upvotes: 0

Related Questions