Ishan Bhatt
Ishan Bhatt

Reputation: 10269

Regular expression for optional fields in Python

I need to parse a line with regular expression with it's last two parameter being optional. I am giving you an example and the expression I have written.

exclaim and name are optional at the end.

x = re.compile('(?P<stop_id>\d{9})\s*(?P<admin_one>[[\x00-\x7F]{6}|\s{6}])\s*'
    '(?P<service_one>[[\x00-\x7F]{3}|\s{3])\s(?P<line_one>.{8})\s*'
    '(?P<direction_one>[[\x00-\x7F]{1}|\s{1}])\s*(?P<admin_two>[[\x00-\x7F]{6}|\s{6}])\s*'
    '(?P<service_two>[[\x00-\x7F]{3}|\s{3])\s(?P<line_two>.{8})\s*'
    '(?P<direction_two>[[\x00-\x7F]{1}|\s{1}])\s*'
    '(?P<interchange_time>[[\x00-\x7F]{3}|\s{3}])'
    '(\s+(?P<exclaim>).{1})?(\s+(?P<stop_name>.+))?')

and when I search the following string on it,

m = x.search('071124127 00006_ 022 94N      1 00006_ 022 83N      * 006  Radhuspladsen')

it gives following output when i do m.groups()

('071124127', '00006_', '022', '94N     ', '1', '00006_', '022', '83N     ',
 '*', '006', '  R', '', None, None)

I need the exclaim as None and stop_name as Radhuspladsen. how to write regex for it??

Upvotes: 2

Views: 187

Answers (2)

vks
vks

Reputation: 67998

(?P<stop_id>\d{9})\s*(?P<admin_one>[[\x00-\x7F]{6}|\s{6}])\s*(?P<service_one>[[\x00-\x7F]{3}|\s{3])\s(?P<line_one>.{8})\s*(?P<direction_one>[[\x00-\x7F]{1}|\s{1}])\s*(?P<admin_two>[[\x00-\x7F]{6}|\s{6}])\s*(?P<service_two>[[\x00-\x7F]{3}|\s{3])\s(?P<line_two>.{8})\s*(?P<direction_two>[[\x00-\x7F]{1}|\s{1}])\s*(?P<interchange_time>[[\x00-\x7F]{3}|\s{3}])(?:\s+(?P<exclaim>.{1}(?=\s)))?(?:\s*(?P<stop_name>.+))?

Try this.This will give you stop_name.The issue was exclaim was eating up the spaces so stop_name could not get any space to start with.I changes that to \s* so that it can start without space as well.

See demo.

http://regex101.com/r/dN8sA5/14

Upvotes: 1

Kasravnd
Kasravnd

Reputation: 107357

I think the problem is for ? at the last part ! you put the question sign out of the parenthesis so it work on \s+ too ! put it inside and in a proper position also you must remove .{1} at the end of exclaim! sp change it to this:

'(\s+(?P<exclaim>))?(\s+(?P<stop_name>.+)?)'

Demo:http://regex101.com/r/kA8pE8/1

Upvotes: 1

Related Questions