jackhab
jackhab

Reputation: 17718

catching optional part in regular expression

I have an input text which can be either:

"URL: http://www.cnn.com Cookie: xxx; yyy"

or just:

"URL: http://www.cnn.com"

How do I capture both URL and cookie into two separate variables in Python? The part I don't know how specify is the optional cookie.

Thanks.

Upvotes: 1

Views: 761

Answers (4)

ulrichb
ulrichb

Reputation: 20054

str = 'URL: http://www.cnn.com Cookie: xxx; yyy'

match = re.search(r'URL: (\S+)( Cookie: (.*))?', str)
print match.group(1)
print match.group(3)

>>> http://www.cnn.com
>>> xxx; yyy

Upvotes: 1

Laurence Gonsalves
Laurence Gonsalves

Reputation: 143344

Just use separate capture groups, and ? for the optional part of your regex. If a capture group doesn't capture anything the group's value will be None.

>>> regex = re.compile(r'URL: (\S+)(?:\s+Cookie: (\S+))?')
>>> regex.match("URL: http://www.cnn.com Cookie: xxx;yyy").groups()
('http://www.cnn.com', 'xxx;yyy')
>>> regex.match("URL: http://www.cnn.com").groups()
('http://www.cnn.com', None)

I've just used \S+ for the URL and cookie patterns in the above for example purposes. Replace them with your real URL and cookie patterns.

Instead of groups() you can use group(1) and group(2) -- the behavior is the same, but groups() is nice with unpacking. eg:

url, cookie = match.groups()

Upvotes: 1

infrared
infrared

Reputation: 3626

import re

inputstring = "URL: http://www.cnn.com Cookie: xxx; yyy"

if 'Cookie' in inputstring:
    m = re.match('URL: (.*?) Cookie: (.*)', inputstring)
    if m:
        url = m.group(1)
        cookie = m.group(2)
        print url
        print cookie
else:
    m = re.match('URL: (.*)', inputstring)
    if m:
        url = m.group(0)
        print url

Upvotes: 1

Georgi
Georgi

Reputation: 395

Enclose the optional part in (Cookie: xxx; yyy")?

Upvotes: 0

Related Questions