Reputation: 11162

How to extract zipcode in the middle of a string?

Here is an address:

address = "35 rue de trucmuche, 75009 PARIS"

I want to extract the zipcode (75009) in the address using a Regex.

I tried this:

reg = re.compile('^.*(P<zipcode>\d{5}).*$')
match = reg.match(address)
match.groupdict().zipcode # should be 75009

I get a:

AttributeError: 'NoneType' object has no attribute 'groupdict'

I think my Regex is wrong. I can't understand why.

Upvotes: 3

Answers (3)

Reputation: 746

Your regex is wrong. That is why it won't match, it returns None, and complains that None doesn't have a groupdict().

In fact there are two mistakes as far as I can see.

reg = re.compile('^.*(?P<zipcode>\d{5}).*$')

------------------------------------^--------------------- (needs a '?' in front of 'P')

and the other mistake that will come up is that groupdict() nees to be accessed like a normal dict, that is

match.groupdict()['zipcode']

You should probably also put a check to see that the match matches, e.g.

if match:
     match.groupdict()['zipcode']

as according to https://docs.python.org/2/library/re.html#match-objects the match object will return True if it exists.

Upvotes: 0

Reputation: 4887

You are just missing the ? in the named capturing group:

^.*(?P<zipcode>\d{5}).*$

reg = re.compile('^.*(?P<zipcode>\d{5}).*$')
match = reg.match(address)
match.groupdict().zipcode # should be 75009

Upvotes: 5

Reputation:

A named capture group in Python must start with ?:

>>> import re
>>> address = "35 rue de trucmuche, 75009 PARIS"
>>> re.match('^.*(?P<zipcode>\d{5}).*$', address).groupdict()['zipcode']
'75009'

Otherwise, you will be trying to match the literal text P<zipcode>.

Also, the .groupdict() method returns a normal Python dictionary:

>>> type(re.match('^.*(?P<zipcode>\d{5}).*$', address).groupdict())
<class 'dict'>

This means that you will need to access the zipcode value as dct['zipcode'], not dct.zipcode.

Upvotes: 4