Reputation: 1305
I have a table in Amazon redshift that lists api endpoints and their usage, and need to query the usage stats. Unfortunately some of the endpoints include ids in the name, so I need a way of grouping by the endopint regardless of what id is in the url.
Example data:
endpoint
'a/b/c'
'a/b/c/19'
'd/20'
'd/1'
'e/f'
'e/f'
I need a query that would take this data and output
endpoint, count(*)
'a/b/c/*', 2
'd/*', 2
'e/f' 2
So far I have just tried to exclude ones with specific ids using something along the lines of
SELECT
endpoint, count(*)
FROM
api_requests
WHERE
endpoint NOT LIKE '%/[0-9]/%'
GROUP BY
endpoint
ORDER BY
count(*)
DESC;
But a) This doesn't work for some reason, and b) ideally I would group them by the id instead
Any help would be greatly appreciated
Upvotes: 0
Views: 305
Reputation: 1270993
You can use regexp_replace()
:
select regexp_replace(endpoint, '/[0-9]+$', '') as canonical,
count(*)
from api_requests
group by canonical;
This gets rid of the last group group if it is all numbers.
Upvotes: 1