Reputation: 189
Basically I am looking to extract a URL until it encounters a number which may or may not be present
Examples:
http://www.test.com/products/cards/product_code/12345/something_else
http://www.test.com/products/cards/product_code2/
Desired output -
http://www.test.com/products/cards/product_code/
http://www.test.com/products/cards/product_code2/
Additional Information - Languauge agnostic regex similar to this question Getting parts of a URL (Regex)
Many Thanks
Upvotes: 0
Views: 145
Reputation: 2157
Using sed
:
sed 's#\(http://.*/\)[0-9]\+.*#\1#'
which means :
http://
up to (not including) the first encountered digit with is preceded by a slash : \(http://.*/\)[0-9]\+
.*$
\1
I chose #
as the sed
separator instead of the classical /
because otherwise you would have to escape these characters in your regexp.
Upvotes: 0
Reputation: 660
Here is a simple regex way of doing it:
<?php
$url = "http://www.test.com/products/cards/product_code/1234";
$pattern = '/\/[0-9]/';
preg_match($pattern, $url, $matches);
if (count($matches) > 0) {
echo substr($url, 0, strpos($url,$matches[0])+1);
} else {
echo $url;
}
?>
Upvotes: 0
Reputation: 93636
This might not be a job for regexes, but for existing tools in your language of choice. What language are you using? You probably don't want to use a regex, but rather an existing module that has already been written, tested, and debugged.
If you're using PHP, you want the parse_url
function.
If you're using Perl, you want the URI
module.
If you're using Ruby, use the URI
module.
Upvotes: 1