Daxon
Daxon

Reputation: 1407

Regex: Substring the second last value between two slashes of a url string

I have a string like this:

http://www.example.com/value/1234/different-value

How can I extract the 1234?

Note: There may be a slash at the end:

http://www.example.com/value/1234/different-value
http://www.example.com/value/1234/different-value/

Upvotes: 6

Views: 4909

Answers (4)

AndyG
AndyG

Reputation: 41220

I think this is a little simpler than the accepted answer, because it doesn't use any positive lookahead (?=), but rather simply makes the last slash optional via the ? character:

^.+\/(.+)\/.+\/?$

In Ruby:

STDIN.read.split("\n").each do |nextline|
   if nextline =~ /^.+\/(.+)\/.+\/?$/
    printf("matched %s in %s\n", $~[1], nextline);
   else
    puts "no match"
   end
end

Live Demo


Let's break down what's happening:

  • ^: start of the line
  • .+\/: match anything (greedily) up to a slash
    • Since we're going to later match at least 1, at most 2 more slashes, this slash will be either the second last slash (as in http://www.example.com/value/1234/different-value) or the third last slash as in (http://www.example.com/value/1234/different-value/)
    • Up to this point we've matched http://www.example.com/value/ (due to greediness)
  • (.+)\/: Our capturing group for 1234 indicated by the parenthesis. It's anything followed by another slash.
    • Since the previous match matched up to the second or third last slash, this will match up to the last slash or second last slash, respectively
  • .+: match anything. This would be after our 1234, so we're assuming there are characters after 1234/ (different-value)
  • \/?: optionally match another slash (the slash after different-value)
  • $: match the end of the line

Note that in a url, you probably won't have spaces. I used the . character because it's easily distinguished, but perhaps you might use \S instead to match non-spaces.

Also, you might use \A instead of ^ to match start of string (instead of after line break) and \Z instead of $ to match end of string (instead of at line break)

Upvotes: 1

Todd A. Jacobs
Todd A. Jacobs

Reputation: 84453

Use Slice for Positional Extraction

If you always want to extract the 4th element (including the scheme) from a URI, and are confident that your data is regular, you can use Array#slice as follows.

'http://www.example.com/value/1234/different-value'.split('/').slice 4
#=> "1234"

'http://www.example.com/value/1234/different-value/'.split('/').slice 4
#=> "1234"

This will work reliably whether there's a trailing slash or not, whether or not you have more than 4 elements after the split, and whether or not that fourth element is always strictly numeric. It works because it's based on the element's position within the path, rather than on the contents of the element. However, you will end up with nil if you attempt to parse a URI with fewer elements such as http://www.example.com/1234/.

Use Scan/Match for Pattern Extraction

Alternatively, if you know that the element you're looking for is always the only one composed entirely of digits, you can use String#match with look-arounds to extract just the numeric portion of the string.

'http://www.example.com/value/1234/different-value'.match %r{(?<=/)\d+(?=/)}
#=> #<MatchData "1234">

$&
#=> "1234"

The look-behind and look-ahead assertions are needed to anchor the expression to a path. Without them, you'll match things like w3.example.com too. This solution is a better approach if the position of the target element may change, and if you can guarantee that your element of interest will be the only one that matches the anchored regex.

If there will be more than one match (e.g. http://www.example.com/1234/5678/) then you might want to use String#scan instead to select the first or last match. This is one of those "know your data" things; if you have irregular data, then regular expressions aren't always the best choice.

Upvotes: 2

FailedDev
FailedDev

Reputation: 26940

Javascript:

var myregexp = /:\/\/.*?\/.*?\/(\d+)/;
var match = myregexp.exec(subject);
if (match != null) {
    result = match[1];
}

Works with your examples... But I am sure it will fail in general...

Ruby edit:

if subject =~ /:\/\/.*?\/.*?\/(.+?)\//
    match = $~[1]

It does work.

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336488

/([^/]+)(?=/[^/]+/?$)

should work. You might need to format it differently according to the language you're using. For example, in Ruby, it's

if subject =~ /\/([^\/]+)(?=\/[^\/]+\/?\Z)/
    match = $~[1]
else
    match = ""
end

Upvotes: 4

Related Questions