Query redshift grouping by similar strings

Question

I have a table in Amazon redshift that lists api endpoints and their usage, and need to query the usage stats. Unfortunately some of the endpoints include ids in the name, so I need a way of grouping by the endopint regardless of what id is in the url.

Example data:

endpoint
'a/b/c'
'a/b/c/19'
'd/20'
'd/1'
'e/f'
'e/f'

I need a query that would take this data and output

endpoint, count(*)
'a/b/c/*', 2
'd/*',     2
'e/f'      2

So far I have just tried to exclude ones with specific ids using something along the lines of

SELECT 
    endpoint, count(*) 
FROM 
    api_requests 
WHERE 
    endpoint NOT LIKE '%/[0-9]/%'
GROUP BY 
    endpoint 
ORDER BY 
    count(*) 
DESC;

But a) This doesn't work for some reason, and b) ideally I would group them by the id instead

Any help would be greatly appreciated

Gordon Linoff · Accepted Answer

You can use regexp_replace():

select regexp_replace(endpoint, '/[0-9]+$', '') as canonical,
       count(*)
from api_requests 
group by canonical;

This gets rid of the last group group if it is all numbers.

Query redshift grouping by similar strings

Answers (1)

Related Questions