thinking_hydrogen
thinking_hydrogen

Reputation: 189

Extract parts of a URL

Basically I am looking to extract a URL until it encounters a number which may or may not be present

Examples:

http://www.test.com/products/cards/product_code/12345/something_else
http://www.test.com/products/cards/product_code2/

Desired output -

http://www.test.com/products/cards/product_code/
http://www.test.com/products/cards/product_code2/

Additional Information - Languauge agnostic regex similar to this question Getting parts of a URL (Regex)

Many Thanks

Upvotes: 0

Views: 145

Answers (3)

Mickaël Le Baillif
Mickaël Le Baillif

Reputation: 2157

Using sed :

sed 's#\(http://.*/\)[0-9]\+.*#\1#'

which means :

  • capture any characters starting with http:// up to (not including) the first encountered digit with is preceded by a slash : \(http://.*/\)[0-9]\+
  • continue matching any character up to the end of the line : .*$
  • replace these with what has been previously captured : \1

I chose # as the sed separator instead of the classical / because otherwise you would have to escape these characters in your regexp.

Upvotes: 0

blamonet
blamonet

Reputation: 660

Here is a simple regex way of doing it:

<?php

$url = "http://www.test.com/products/cards/product_code/1234";
$pattern = '/\/[0-9]/';

preg_match($pattern, $url, $matches);

if (count($matches) > 0) {
    echo substr($url, 0, strpos($url,$matches[0])+1);
} else {
    echo $url;
}
?>

Upvotes: 0

Andy Lester
Andy Lester

Reputation: 93636

This might not be a job for regexes, but for existing tools in your language of choice. What language are you using? You probably don't want to use a regex, but rather an existing module that has already been written, tested, and debugged.

If you're using PHP, you want the parse_url function.

If you're using Perl, you want the URI module.

If you're using Ruby, use the URI module.

Upvotes: 1

Related Questions