Reputation: 103
I am trying to extract an id number from urls in the form of
http://www.domain.com/some-slug-here/person/237570
http://www.domain.com/person/237570
either one of these urls could also have params on them
http://www.domain.com/some-slug-here/person/237570?q=some+search+string
http://www.domain.com/person/237570?q=some+search+string
I have tried the following expressions to capture the id value of '237570' from the above urls but each one kinda works but does work across all four url scenarios.
(?<=person\/)(.*)(?=\?)
(?<=person\/)(.*)(?=\?|\z)
(?<=person\/)(.*)(?=\??*)
what I am seeing happening is it is getting the 237570 but including the ? and characters that come after it in the url. how can I say stop capturing either when you hit a ?, /, or the end of the string?
Upvotes: 1
Views: 1318
Reputation: 94
String:
http://www.domain.com/some-slug-here/person/1234?q=some+search+string
http://www.domain.com/person/3456?q=some+search+string
http://www.domain.com/some-slug-here/person/5678
http://www.domain.com/person/7890
Regexp:
person\/(\d{1,})
Output:
>>> regex.findall(string)
[u'1234', u'3456', u'5678', u'7890']
Upvotes: 2
Reputation: 44289
Don't use .*
to match the ID. .
will match any character (except for line breaks, unless you use the DOTALL option). Just match a bunch of digits: (.*)
--> (\d+)
Upvotes: 1