Use postgres regexp_replace() to replace integers in a string

Question

I have a redshift database that I need to query, and need to group similar strings together. I'm using regexp_replace() to do this, but can't work out how to get it grouping strings that have integers in the middle. For example:

dataset:

string
'aaa/123/bbb'
'aaa/456/bbb'
'ccc/123/ddd'

I need to group this so that we'd get

string     count(*)
aaa/id/bbb 2
ccc/id/ddd 1

So I have tried using

regexp_replace(endpoint, '/[0-9]+$/', '/id/')

But it doesn't work, I'm assuming because there is no wildcard or something? But I can't work out how to fix this.

Thanks in advance

Gordon Linoff · Accepted Answer

I know that you also want to replace the numbers at the end. This comes close to what you want:

select regexp_replace(endpoint, '/[0-9]+(/|$)', '/id/')
from (select 'aaa/123/bbb' as endpoint union all
      select 'aaa/123' as endpoint 
      ) x

But it returns a slash at the end in the second case.

If you have no other intermediate values that start with numbers, then this does what you want:

select regexp_replace(endpoint, '/[0-9]+', '/id')
from (select 'aaa/123/bbb' as endpoint union all
      select 'aaa/123' as endpoint 
      ) x

Otherwise, two calls to regexp_replace() does the trick:

select regexp_replace(regexp_replace(endpoint, '/[0-9]+/', '/id/'), '/[0-9]$', '/id')
from (select 'aaa/123/bbb' as endpoint union all
      select 'aaa/123' as endpoint 
      ) x;

Use postgres regexp_replace() to replace integers in a string

Answers (1)

Related Questions