Reputation: 1
I am working on an Address parsing project where, I need to detect various components of the address, such as city, state, postal_code, street_no etc.
I wrote a regular expression to filter out the postal codes handling all user inputs.
sample_add = "16th main road btm layout 560029 5-6-00-76 56 00 78 560-029 25 -000-1"
regexp = re.compile(r"([\d])[ -]*?([\d])[ -]*?([\d])[ -]*?([\d])[ -]*?([\d])[ -]*?([\d])")
print(re.findall(regexp, sample_add))
Output :- [560029, 560076, 560078, 560029, 250001]
This is able to identify postal_codes for such addresses, However, when an address like the following comes, it combines the Street nos and interprets it as the postal code,
Ex. `sample_add_2 = "House no 323/46 16th main road, btm layout, bengaluru 560029"
In this case, the postal code is identified as 323461, while the correct one should have been 560029.
Upvotes: 0
Views: 1773
Reputation: 977
If I undestood it right we search for a 6 digit number but wich can include some delimiters like -
, but not
\
.This should handle it. (If not, please explaind you´re desired outcome):
\b(\d[\- ]*){6}\b(?<! )
https://regex101.com/r/wxYgwr/3
Upvotes: 0