Reputation: 117

Regex - Python matching between string and first occurence

I'm having a hard time grasping regex no matter how much documentation I read up on. I'm trying to match everything between a a string and the first occurrence of & this is what I have

link =  "group.do?sys_id=69adb887157e450051e85118b6ff533c&amp;&"
rex = re.compile("group\.do\?sys_id=(.?)&")
sysid = rex.search(link).groups()[0]

I'm using https://regex101.com/#python to help me validate my regex and I can kinda get rex = re.compile("user_group.do?sys_id=(.*)&") to work but the .* is greedy and matches to the last & and im looking to match to the first &

I thought .? matches zero to 1 time

Upvotes: 2

Answers (3)

Eric

Reputation: 21

.*

is greedy but

.*?

should not be in regex.

.?

would only look for any character 0-1 times while

.*?

will look for it up to the earliest matching occurrence. I hope that explains it.

Upvotes: 2

Brian

Reputation: 1667

You can simply regex out to the &amp instead of the final & like so:

import re
link =  "user_group.do?sys_id=69adb887157e450051e85118b6ff533c&amp;&"
rex = re.compile("user_group\.do\?sys_id=(.*)&amp;&")
sysid = rex.search(link).groups()[0]

print(sysid)

Upvotes: 2

alecxe

Reputation: 474161

You don't necessarily need regular expressions here. Use urlparse instead:

>>> from urlparse import urlparse, parse_qs 
>>> parse_qs(urlparse(link).query)['sys_id'][0]
'69adb887157e450051e85118b6ff533c'

In case of Python 3 change the import to:

from urllib.parse import urlparse, parse_qs

Upvotes: 7

Regex - Python matching between string and first occurence

Answers (3)

Related Questions