user1874064
user1874064

Reputation: 323

How to use a regex capture group in redshift (or alternative)

I have a field in a redshift column that looks like the following:

abcd1234df-TEXT_I-WANT

the characters and numbers in the first 10 digits can be either letters or numbers.

If I use a capture group regex, I would use a poorly written expression like (\w\w\w\w\w\w\w\w\w\w\W)(.*) and grap the 2nd group

But I'm having trouble implementing this in redshift, so not sure how I can grab only the stuff after the first hyphen

Upvotes: 8

Views: 7007

Answers (5)

wp78de
wp78de

Reputation: 18950

As mentioned before, regex might be an overkill. However, it could be useful in some cases.

Here's a basic replace pattern:

SELECT
    regexp_replace(
      'abcd1234df-TEXT_I-WANT'  -- use your input column here instead
    , '^[a-z0-9]{10}-(.*)$'     -- matches whole string, captures "TEXT_I-WANT" in $1
    , '$1'                      -- inserts $1 to return TEXT_I-WANT
    )
;

Upvotes: 12

Barbaros Özhan
Barbaros Özhan

Reputation: 65228

You can get the desired string by using the common table expression with regexp_substr as below :

with cte as
(
 select 'abcd1234df-TEXT_I-WANT' as str   
)    
select regexp_substr(str,'-.*') 
       as derived_str
  from cte;

derived_str
-------------
-TEXT_I-WANT  

Upvotes: 0

botchniaque
botchniaque

Reputation: 5084

@wp78de gives a very good advice to use REGEX_REPLACE. I allows you to choose the capture group. Using your regex, it would look like that, although you don't need 2 groups in here and using one is sufficient here.

select 
  regexp_replace(
    'abcd1234df-TEXT_I-WANT',
    '(\\w\\w\\w\\w\\w\\w\\w\\w\\w\\w\\W)(.*)', 
    '$2' -- replacement selecting 2nd capture group
  );

Another oprion, although less flexible is using REGEX_SUBSTR with e parameter set (Extract a substring using a subexpression). It allows you to select a substring, but only of a first capture group in your regex. You also have to set the position and occurence parameters to default 1:

Using REGEX you suggested, but only with 1 group:

select 
  regexp_substr(
    'abcd1234df-TEXT_I-WANT',
    '\\w\\w\\w\\w\\w\\w\\w\\w\\w\\w\\W(.*)', 
    1, -- position
    1, -- occurrence
    'e' -- parameters
  );

Upvotes: 4

Gordon Linoff
Gordon Linoff

Reputation: 1269773

Regular expressions might be overkill. Basic string operations are good enough:

select substring(col from position('-' in col) + 1)

Upvotes: 3

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49260

This can be done with charindex and substring.

substring(col,charindex('-',col)+1)

Upvotes: 1

Related Questions