Reputation: 215
I need help to create a python function to make Main street address (usually house number and street name) in Address field. Additional address information (Suite, Unit, Space, PO Box, other additional details) saved to Address2
Here are few examples of Address format which need to split.
780 Main Street, P.O. Box 4109 -> 780 Main Street / PO Box 4109
438 University Ave. P.O. Box 5 -> 438 University Ave. / PO Box 5
HIGHWAY 10 BOX 39 -> HIGHWAY 10 / PO Box 39
98 LATHROP ROAD - BOX 147 -> 98 LATHROP ROAD / PO Box 147
396 S MAIN/P.O. BOX 820 -> 396 S MAIN / PO Box 820
HWY 18 AND HWY 128 (BOX 1305) -> HWY 18 AND HWY 128 / PO Box 1305
808 Innisfil Beach Rd Box 2 -> 808 Innisfil Beach Rd / PO Box 2
100 St 101 Ave, P.o. Box 1620 -> 100 St 101 Ave / P.O. Box 1620
201 Del Rio (p.O. Box 309 -> 201 Del Rio / PO Box 309
BOX 487 2054 HWY 1 EAST -> 2054 HWY 1 EAST / PO Box 487
P O BOX 2820 41340 BIG BEAR BL -> 41340 BIG BEAR BL / PO Box 2820
2813 HWY 15 - P O BOX 1083 -> 2813 HWY 15 / PO Box 1083
P.o. Box 838 2540 Hwy 43 West -> 2540 Hwy 43 West / POBox 838
I have tried below code. But It can remove important information from address and leave PO Box data in address (not to move all PO Box data into address2).
input_array = [
'780 Main Street, P.O. Box 410',
'438 University Ave. P.O. Box 5 ',
'HIGHWAY 10 BOX 39',
'98 LATHROP ROAD - BOX 147',
'396 S MAIN/P.O. BOX 820 ',
'HWY 18 AND HWY 128 (BOX 1305)',
'808 Innisfil Beach Rd Box 2',
'100 St 101 Ave, P.o. Box 1620',
'201 Del Rio (p.O. Box 309 ',
'BOX 487 2054 HWY 1 EAST ',
'P O BOX 2820 41340 BIG BEAR BL',
'2813 HWY 15 - P O BOX 1083 ',
'P.o. Box 838 2540 Hwy 43 West'
]
import re
for inputs in input_array:
inputs = (inputs).lower()
for a in (inputs.split(' ')):
if 'box' in a:
box_index = (inputs.split(' ').index(a))
box_num = ((inputs.split(' ')[(inputs.split(' ').index(a)) + 1]))
if (((inputs.split(' ')[(inputs.split(' ').index(a)) + 1])).isdigit()):
if 'p' in ((inputs.split(' ')[(inputs.split(' ').index(a)) - 1])) or 'o' in ((inputs.split(' ')[(inputs.split(' ').index(a)) - 1])):
inputs = inputs.replace(((inputs.split(' ')[(inputs.split(' ').index(a)) - 1])), '')
else:
inputs = inputs.replace(((inputs.split(' ')[(inputs.split(' ').index(a)) + 1])), '')
inputs = inputs.replace(a, '')
inputs = inputs.replace('-', '')
inputs = inputs.replace('/', '')
inputs = inputs.replace(',', '')
print ('address => ',inputs,' address2 => ', 'PO Box ', box_num)
break
Need Improvement in above function to make it more compatible with desired result.
Upvotes: 2
Views: 755
Reputation: 1445
Interesting enough question. Here's regex which works for all of your examples, but I can't say for sure if it will work all the way for your project. Read more regex documentation and play with regular expressions here.
Here's code:
import re
streets = [
'780 Main Street, P.O. Box 410',
'438 University Ave. P.O. Box 5 ',
'HIGHWAY 10 BOX 39',
'98 LATHROP ROAD - BOX 147',
'396 S MAIN/P.O. BOX 820 ',
'HWY 18 AND HWY 128 (BOX 1305)',
'808 Innisfil Beach Rd Box 2',
'100 St 101 Ave, P.o. Box 1620',
'201 Del Rio (p.O. Box 309 ',
'BOX 487 2054 HWY 1 EAST ',
'P O BOX 2820 41340 BIG BEAR BL',
'2813 HWY 15 - P O BOX 1083 ',
'P.o. Box 838 2540 Hwy 43 West'
]
regex = r'([^a-z0-9]*(p[\s.]?o)?[\s.]*?box (\d+)[^a-z0-9]*)'
for street in streets:
match = re.search(regex, street, flags=re.IGNORECASE)
po_box_chunk = match.group(0)
po_box_number = match.group(3)
cleaned_address = street.strip(po_box_chunk)
result = '{} / PO Box {}'.format(cleaned_address, po_box_number)
print(result)
Upvotes: 1