Piyush
Piyush

Reputation: 1

Get Indian postal codes from address string with other numbers

I am working on an Address parsing project where, I need to detect various components of the address, such as city, state, postal_code, street_no etc.

I wrote a regular expression to filter out the postal codes handling all user inputs.

sample_add = "16th main road btm layout 560029 5-6-00-76 56 00 78 560-029 25 -000-1"
regexp = re.compile(r"([\d])[ -]*?([\d])[ -]*?([\d])[ -]*?([\d])[ -]*?([\d])[ -]*?([\d])")
print(re.findall(regexp, sample_add))

Output :- [560029, 560076, 560078, 560029, 250001]

This is able to identify postal_codes for such addresses, However, when an address like the following comes, it combines the Street nos and interprets it as the postal code,

Ex. `sample_add_2 = "House no 323/46 16th main road, btm layout, bengaluru 560029"

In this case, the postal code is identified as 323461, while the correct one should have been 560029.

Upvotes: 0

Views: 1773

Answers (1)

Superluminal
Superluminal

Reputation: 977

If I undestood it right we search for a 6 digit number but wich can include some delimiters like - , but not \.This should handle it. (If not, please explaind you´re desired outcome):

\b(\d[\- ]*){6}\b(?<! )

https://regex101.com/r/wxYgwr/3

Upvotes: 0

Related Questions