David
David

Reputation: 11573

Redshift Extract String Between Two Patterns (regexp_substr)

I'm trying to extract the text following deviceSerialNumber= that occurs until either a & or the end of the string. Data looks like

someddata&=somedataagain&deviceSerialNumber=device12345&anotherField=moreData someddata&=somedataagain&deviceSerialNumber=deviceabcd

I've tried this

REGEXP_SUBSTR(session_tags || '&', 'deviceSerialNumber(.*)&') from table

but, this returns all of the text until the final & (which is the end of the string, since I'm appending a & for pattern matching purposes). How do I have to modify this regular expression to extract only the text until the first &?

Upvotes: 3

Views: 9804

Answers (4)

naba
naba

Reputation: 181

I know it is late to respond, but here is solution that worked for me.

select regexp_substr(
    'someddata&=somedataagain&deviceSerialNumber=device12345&anotherField=moreData ', 
    'deviceSerialNumber=(.*)&', 0, 1, 'e');

Upvotes: 8

emehex
emehex

Reputation: 10538

I was running into the same Invalid preceding regular expression prior to repetition operator with regexp_substr.

The work around I eventually settled on was two nested split_parts:

select
params, 
split_part(split_part(params, 'deviceSerialNumber=', 2), '&', 1)
from (
    select 'someddata&=somedataagain&deviceSerialNumber=device12345&anotherField=moreData' as params
    union all
    select 'someddata&=somedataagain&deviceSerialNumber=deviceabcd' as params
) tmp

Upvotes: 0

David
David

Reputation: 11573

Found a hack solution that involves two levels of queries to get around having to use regexp_subtr. The inner query uses substring and position to pull out all of the text after the deviceSerialNumber tag. The outer query uses the same two functions to cut off any text after the next &

select substring(pre_serial_num, 1, position('&' in pre_device_id || '&') - 1) as device_id
from
    (select substring(session_tags,position('deviceSerialNumber' in session_tags) + 20, 40) as pre_device_id 
    from table) a

eg the inner query takes

someddata&=somedataagain&deviceSerialNumber=device12345&anotherField=moreData

someddata&=somedataagain&deviceSerialNumber=deviceabcd

and strips text before the device serial number tag to give you

device12345&anotherField=moreData

deviceabcd

The second query then strips text after the device serial number tag to give you

deviceSerialNumber=device12345

deviceSerialNumber=deviceabcd

Upvotes: -3

nobody
nobody

Reputation: 11080

Use occurence parameter

REGEXP_SUBSTR(session_tags, 'deviceSerialNumber=(.+?)&',1,1) from table

Or

REGEXP_SUBSTR(session_tags,(?<=deviceSerialNumber=)(.*?)(?=&)) from table

.* will match till the last & and .*? will match till the first &

Upvotes: 1

Related Questions