python regex pattern to extract value between two characters

Question

I am trying to extract an id number from urls in the form of

http://www.domain.com/some-slug-here/person/237570
http://www.domain.com/person/237570

either one of these urls could also have params on them

http://www.domain.com/some-slug-here/person/237570?q=some+search+string
http://www.domain.com/person/237570?q=some+search+string

I have tried the following expressions to capture the id value of '237570' from the above urls but each one kinda works but does work across all four url scenarios.

(?<=person\/)(.*)(?=\?)
(?<=person\/)(.*)(?=\?|\z)
(?<=person\/)(.*)(?=\??*)

what I am seeing happening is it is getting the 237570 but including the ? and characters that come after it in the url. how can I say stop capturing either when you hit a ?, /, or the end of the string?

d3t0n4t0r · Accepted Answer

String:

http://www.domain.com/some-slug-here/person/1234?q=some+search+string
http://www.domain.com/person/3456?q=some+search+string
http://www.domain.com/some-slug-here/person/5678
http://www.domain.com/person/7890

Regexp:

person\/(\d{1,})

Output:

>>> regex.findall(string)
[u'1234', u'3456', u'5678', u'7890']

python regex pattern to extract value between two characters

Answers (2)

Related Questions