John L
John L

Reputation: 103

python regex pattern to extract value between two characters

I am trying to extract an id number from urls in the form of

http://www.domain.com/some-slug-here/person/237570
http://www.domain.com/person/237570

either one of these urls could also have params on them

http://www.domain.com/some-slug-here/person/237570?q=some+search+string
http://www.domain.com/person/237570?q=some+search+string

I have tried the following expressions to capture the id value of '237570' from the above urls but each one kinda works but does work across all four url scenarios.

(?<=person\/)(.*)(?=\?)
(?<=person\/)(.*)(?=\?|\z)
(?<=person\/)(.*)(?=\??*)

what I am seeing happening is it is getting the 237570 but including the ? and characters that come after it in the url. how can I say stop capturing either when you hit a ?, /, or the end of the string?

Upvotes: 1

Views: 1318

Answers (2)

d3t0n4t0r
d3t0n4t0r

Reputation: 94

String:

http://www.domain.com/some-slug-here/person/1234?q=some+search+string
http://www.domain.com/person/3456?q=some+search+string
http://www.domain.com/some-slug-here/person/5678
http://www.domain.com/person/7890

Regexp:

person\/(\d{1,})

Output:

>>> regex.findall(string)
[u'1234', u'3456', u'5678', u'7890']

Upvotes: 2

Martin Ender
Martin Ender

Reputation: 44289

Don't use .* to match the ID. . will match any character (except for line breaks, unless you use the DOTALL option). Just match a bunch of digits: (.*) --> (\d+)

Upvotes: 1

Related Questions