Reputation: 7768
how can I extract the numerical values between /
and .html
?
http://www.site.com/prod/bunch-of-seo-text-of-different-length/12345687.html
http://www.site.com/prod/bunch-of-seo-text-of-different-length/12345688.html
http://www.site.com/prod/bunch-of-seo-text-of-different-length/12345688.html
I tried this, but I don't know about the syntax
SUBSTRING(link FROM '%/%' FOR '%.html%')
Upvotes: 1
Views: 690
Reputation: 656251
The question is:
how can I extract the numerical values between / and .html?
And a proper answer is:
SELECT substring(link, '/(\d+)\.html')
\d
.. the class shorthand for [[:digit:]]
which is equivalent to [0-9]
+
.. one or more of the preceding atom
()
.. parenthesis capture the match
\.
.. dot has to be escaped to loose its special meaning
Test:
WITH x(link) AS (
VALUES
('http://www.site.com/prod/bunch-of-text-of-different-length/12345687.html')
,('http://www.site.com/prod/bunch-of-text/12345688.html')
,('http://www.site2.com/prod/123/text-of-di456fferent-89-len/12345688.html')
)
SELECT substring(link, '/(\d+)\.html') FROM x;
regexp_matches()
is meant for capturing multiple matches and is not a good choice for this.
Upvotes: 2
Reputation:
Another option would be to use the substring function:
select substring(link from '[0-9]+')
from your_table;
This assumes that there is exactly one "number" in the URL.
Upvotes: 0
Reputation: 9582
I am not a postgresql guy, but it looks like it has some built-in regex functions. See PostgreSQL Matching Functions and find the section where it describes the regexp_matches
function. regexp_matches
returns an array ({}).
SELECT regexp_matches(link, '([0-9]+)\.html$');
Combine this with a unnest
see PostgreSQL Arrays:
SELECT unnest(SELECT regexp_matches(link, '([0-9]+)\.html$')) as matches;
Upvotes: 1
Reputation: 10743
This snippet will give you the filename of any url:
SELECT REVERSE(SUBSTRING(REVERSE(REPLACE('http://www.somewhere.com/folder/page.html', '\', '/')), 1, CHARINDEX('/', REVERSE(REPLACE('http://www.somewhere.com/folder/page.html', '\', '/'))) - 1)) as PageName
Output:
page.html
Or, if you are passing in the url as an nvarchar variable:
SELECT REVERSE(SUBSTRING(REVERSE(REPLACE(@Url, '\', '/')), 1, CHARINDEX('/', REVERSE(REPLACE(@Url, '\', '/'))) - 1)) as PageName
Upvotes: 0