Nate
Nate

Reputation: 28444

How to write a regex to extract a number from these URLs?

I'm trying to write a regex to match the numbers in these URLs (12345678 and 1234567890).

http://www.example.com/p/12345678
http://www.example.com/p/12345678?foo=bar
http://www.example.com/p/some-text-123/1234567890?foo=bar

Rules:

My attempt:

\/p\/([0-9]+)

That matches the first and second, but not the third. So I tried:

\/p\/[^\/?]*\/?([0-9]+)

No joy.

REGEX 101

Upvotes: 5

Views: 381

Answers (5)

superultranova
superultranova

Reputation: 1314

Regex might not be the right tool for this job. It looks like in every case, splitting the URL with a URL parser would make more sense. From your examples, it appears that the number portion is always the last item in the path portion of the URL. I'm not sure what language you're using, but many languages offer functions that can parse URLs into their constituent parts.

$path = parse_url($url, PHP_URL_PATH);
if(strpos($path, "/p/") === 0) {
    $base = basename($path);
} else {
    // error
}

Works every time, assuming $url is the string you are parsing.

Upvotes: 2

vks
vks

Reputation: 67988

\/p\/(?:.*\/)?(\d+)\b

You can try this.This will capture integers based on your coditons.See demo.Grab the capture or group.

https://regex101.com/r/dU7oN5/29

$re = "/\\/p\\/(?:.*\\/)?(\\d+)\\b/";
$str = "http://www.example.com/p/12345678\nhttp://www.example.com/p/12345678?foo=bar\nhttp://www.example.com/p/some-text-123/1234567890?foo=bar";

preg_match_all($re, $str, $matches);

Upvotes: 0

Robin
Robin

Reputation: 9644

If I understand well, the digits you want can only be:

  • right after the last slash of the URL
  • cannot be part of the variables, ie /p/123?foo=bar456 matches 123 and
    /p/foobar?foo=bar456 matches nothing

You can then use the following regex:

(?=/p/).*/\K\d+

Explanation

(?=/p/)  # lookahead: check '/p/' is in the URL
.*/      # go to the last '/' thanks to greediness
\K       # leave everything we have so far out of the final match
\d+      # select the digits just after the last '/'

To avoid escaping forward slashes don't use them as regex delimiters: #(?=/p/).*/\K\d+# will do fine.

See demo here.

Upvotes: 0

msrd0
msrd0

Reputation: 8420

I extended your version, it now works with all examples:

\/p\/(.+\/)*(\d+)(\?.+=.+(&.+=.+)*)?$

If you don't care that the URL is valid, you could shrink the regex to:

\/p\/(.+\/)*(\d+)($|\?)

https://regex101.com/r/pW5qB3/2

Upvotes: 1

Avijit
Avijit

Reputation: 1229

var regex = new Regex(@"/(?<ticket>\d+)");

var subject = "http://www.example.com/p/some-text-123/1234567890?foo=bar";

var ticket = regex.Match(subject).Groups["ticket"].Value;

Output: 1234567890

Upvotes: -2

Related Questions