Mr.Meeseeks
Mr.Meeseeks

Reputation: 21

How to group by a regular expression in a postgres query

I'm cleaning up someone else's restful application, and in doing so it appears that there are some routes that aren't being used. To begin troubleshooting, I created a table with a unique text column to store the route.

---------routes---------
https://test.com/user/1/info
https://test.com/test/2/info
https://test.com/banana/100
https://test.com/post/3/date
https://test.com/post/
https://test.com/grape/
http://test.com/post/3/date
https://test.com/banana/3
https://test.com/user/2/info
https://test.com/test/5/info
.
.
.

now what id like to do is, query using some regular expression (or whatever), group the entries above to get the following results:

---------routes---------
https://test.com/user/{x}/info
https://test.com/test/{x}/info
https://test.com/post/{x}/date
https://test.com/post/
https://test.com/grape/
http://test.com/post/{x}/date
https://test.com/banana/{x}

where {x} is just some generic marker that comes as a result of the grouping. I know we can search for a specific regular expression, but I don't how to try and collapse the strings into groups and then spit back out the 'recommended' groupings

PS: because we are stuck in the stone age, any solution is limited by postgresql 8.4.20

EDIT--

klin, your answer isn't not quite working for me, as its giving me

     regexp_replace        | count 
------------------------------+-------
 https://test.com/user/1/info |     1
 https://test.com/test/2/info |     1
 https://test.com/banana/100  |     1
 \x01{x}ate                   |     2
 https://test.com/user/2/info |     1
 https://test.com/grape/      |     1
 https://test.com/test/5/info |     1
 https://test.com/post/       |     1
 https://test.com/banana/3    |     1
(9 rows)

but at least this gives me some ideas, I'll post back when I play with it a bit more

Upvotes: 2

Views: 3295

Answers (1)

klin
klin

Reputation: 121889

I'm not able to test this in 8.4...

with routes(url) as (
values
    ('https://test.com/user/1/info'),
    ('https://test.com/test/2/info'),
    ('https://test.com/banana/100'),
    ('https://test.com/post/3/date'),
    ('https://test.com/post/'),
    ('https://test.com/grape/'),
    ('http://test.com/post/3/date'),
    ('https://test.com/banana/3'),
    ('https://test.com/user/2/info'),
    ('https://test.com/test/5/info')
)

select regexp_replace(url, '^(.+//.+/.+/)\d+', '\1{x}'), count(*)
from routes
group by 1

         regexp_replace         | count 
--------------------------------+-------
 https://test.com/banana/{x}    |     2
 https://test.com/post/{x}/date |     1
 http://test.com/post/{x}/date  |     1
 https://test.com/user/{x}/info |     2
 https://test.com/test/{x}/info |     2
 https://test.com/grape/        |     1
 https://test.com/post/         |     1
(7 rows)    

You can test this here (Postgres 9.5).

Check pattern here.

Upvotes: 5

Related Questions