Reputation: 19
I am trying to check best and optimal way to exclude a pattern of string from a column, without affecting actual data.
With in Redshift DW, I have table column company which has certain records ending with INC in different ways so want to exclude string patterns of INC and capture just company name. Please see below sample data and expected output.
WITH T AS (
select 'Cincin,Inc' id
union all
select 'Tinc, INc.' id
union all
select 'Cloud' id
union all
select 'Dinct Inc.' id
)
select id , regexp_replace(id,{exp}) from T
/**OutPut***/
Cincin
Tinc
Cloud
Dinct
Upvotes: 0
Views: 119
Reputation: 425013
Redshift doesn't support case insensitivity with regex, but given your target string is small, you can work around it without too much pain by using [Ii][Nn][Cc]
:
regexp_replace(id, ',? *[Ii][Nn][Cc]\.?$', '')
See live demo.
Test:
WITH T AS (
select 'Cincin,Inc' id
union all
select 'Tinc, INc.' id
union all
select 'Cloud' id
union all
select 'Dinct Inc.' id
)
select id , regexp_replace(id, ',? *[Ii][Nn][Cc]\.?$', '') from T
Output:
Cincin
Tinc
Cloud
Dinct
Upvotes: 1
Reputation: 2440
You can use this if you are not particluar about the case
WITH T AS (
select 'Cincin,Inc' id
union all
select 'Tinc, INc.' id
union all
select 'Cloud' id
union all
select 'Dinct Inc.' id
)
select id , regexp_replace(lower(iD),'[^a-z]+(inc)([^a-z])*','')
from T
output :
id regexp_replace
Cincin,Inc cincin
Tinc, INc. tinc
Cloud cloud
Dinct Inc. dinct
Upvotes: 0
Reputation: 521194
Try replacing the pattern ,?\s*Inc\.?$
:
select id, regexp_replace(id, ',?\\s*[Ii][Nn][Cc]\\.?$', '') from T
Upvotes: 0