Reputation: 1407
I have a string like this:
http://www.example.com/value/1234/different-value
How can I extract the 1234
?
Note: There may be a slash at the end:
http://www.example.com/value/1234/different-value
http://www.example.com/value/1234/different-value/
Upvotes: 6
Views: 4909
Reputation: 41220
I think this is a little simpler than the accepted answer, because it doesn't use any positive lookahead (?=
), but rather simply makes the last slash optional via the ?
character:
^.+\/(.+)\/.+\/?$
In Ruby:
STDIN.read.split("\n").each do |nextline|
if nextline =~ /^.+\/(.+)\/.+\/?$/
printf("matched %s in %s\n", $~[1], nextline);
else
puts "no match"
end
end
Let's break down what's happening:
^
: start of the line.+\/
: match anything (greedily) up to a slash
http://www.example.com/value/1234/different-value
) or the third last slash as in (http://www.example.com/value/1234/different-value/
)http://www.example.com/value/
(due to greediness)(.+)\/
: Our capturing group for 1234
indicated by the parenthesis. It's anything followed by another slash.
.+
: match anything. This would be after our 1234
, so we're assuming there are characters after 1234/
(different-value
)\/?
: optionally match another slash (the slash after different-value
)$
: match the end of the lineNote that in a url, you probably won't have spaces. I used the .
character because it's easily distinguished, but perhaps you might use \S
instead to match non-spaces.
Also, you might use \A
instead of ^
to match start of string (instead of after line break) and \Z
instead of $
to match end of string (instead of at line break)
Upvotes: 1
Reputation: 84453
If you always want to extract the 4th element (including the scheme) from a URI, and are confident that your data is regular, you can use Array#slice as follows.
'http://www.example.com/value/1234/different-value'.split('/').slice 4
#=> "1234"
'http://www.example.com/value/1234/different-value/'.split('/').slice 4
#=> "1234"
This will work reliably whether there's a trailing slash or not, whether or not you have more than 4 elements after the split, and whether or not that fourth element is always strictly numeric. It works because it's based on the element's position within the path, rather than on the contents of the element. However, you will end up with nil if you attempt to parse a URI with fewer elements such as http://www.example.com/1234/
.
Alternatively, if you know that the element you're looking for is always the only one composed entirely of digits, you can use String#match with look-arounds to extract just the numeric portion of the string.
'http://www.example.com/value/1234/different-value'.match %r{(?<=/)\d+(?=/)}
#=> #<MatchData "1234">
$&
#=> "1234"
The look-behind and look-ahead assertions are needed to anchor the expression to a path. Without them, you'll match things like w3.example.com
too. This solution is a better approach if the position of the target element may change, and if you can guarantee that your element of interest will be the only one that matches the anchored regex.
If there will be more than one match (e.g. http://www.example.com/1234/5678/
) then you might want to use String#scan instead to select the first or last match. This is one of those "know your data" things; if you have irregular data, then regular expressions aren't always the best choice.
Upvotes: 2
Reputation: 26940
Javascript:
var myregexp = /:\/\/.*?\/.*?\/(\d+)/;
var match = myregexp.exec(subject);
if (match != null) {
result = match[1];
}
Works with your examples... But I am sure it will fail in general...
Ruby edit:
if subject =~ /:\/\/.*?\/.*?\/(.+?)\//
match = $~[1]
It does work.
Upvotes: 1
Reputation: 336488
/([^/]+)(?=/[^/]+/?$)
should work. You might need to format it differently according to the language you're using. For example, in Ruby, it's
if subject =~ /\/([^\/]+)(?=\/[^\/]+\/?\Z)/
match = $~[1]
else
match = ""
end
Upvotes: 4