Reputation: 1305
I have a redshift database that I need to query, and need to group similar strings together. I'm using regexp_replace() to do this, but can't work out how to get it grouping strings that have integers in the middle. For example:
dataset:
string
'aaa/123/bbb'
'aaa/456/bbb'
'ccc/123/ddd'
I need to group this so that we'd get
string count(*)
aaa/id/bbb 2
ccc/id/ddd 1
So I have tried using
regexp_replace(endpoint, '/[0-9]+$/', '/id/')
But it doesn't work, I'm assuming because there is no wildcard or something? But I can't work out how to fix this.
Thanks in advance
Upvotes: 0
Views: 889
Reputation: 1269603
I know that you also want to replace the numbers at the end. This comes close to what you want:
select regexp_replace(endpoint, '/[0-9]+(/|$)', '/id/')
from (select 'aaa/123/bbb' as endpoint union all
select 'aaa/123' as endpoint
) x
But it returns a slash at the end in the second case.
If you have no other intermediate values that start with numbers, then this does what you want:
select regexp_replace(endpoint, '/[0-9]+', '/id')
from (select 'aaa/123/bbb' as endpoint union all
select 'aaa/123' as endpoint
) x
Otherwise, two calls to regexp_replace()
does the trick:
select regexp_replace(regexp_replace(endpoint, '/[0-9]+/', '/id/'), '/[0-9]$', '/id')
from (select 'aaa/123/bbb' as endpoint union all
select 'aaa/123' as endpoint
) x;
Upvotes: 1