Abha Rana
Abha Rana

Reputation: 215

python: Splitting Main address into primary and secondary addresses

I need help to create a python function to make Main street address (usually house number and street name) in Address field. Additional address information (Suite, Unit, Space, PO Box, other additional details) saved to Address2

Here are few examples of Address format which need to split.

780 Main Street, P.O. Box 4109 -> 780 Main Street / PO Box 4109

438 University Ave. P.O. Box 5 -> 438 University Ave. / PO Box 5

HIGHWAY 10 BOX 39 -> HIGHWAY 10 / PO Box 39

98 LATHROP ROAD - BOX 147 -> 98 LATHROP ROAD / PO Box 147

396 S MAIN/P.O. BOX 820 -> 396 S MAIN / PO Box 820

HWY 18 AND HWY 128 (BOX 1305) -> HWY 18 AND HWY 128 / PO Box 1305

808 Innisfil Beach Rd Box 2 -> 808 Innisfil Beach Rd / PO Box 2

100 St 101 Ave, P.o. Box 1620 -> 100 St 101 Ave / P.O. Box 1620

201 Del Rio (p.O. Box 309 -> 201 Del Rio / PO Box 309

BOX 487 2054 HWY 1 EAST -> 2054 HWY 1 EAST / PO Box 487

P O BOX 2820 41340 BIG BEAR BL -> 41340 BIG BEAR BL / PO Box 2820

2813 HWY 15 - P O BOX 1083 -> 2813 HWY 15 / PO Box 1083

P.o. Box 838 2540 Hwy 43 West -> 2540 Hwy 43 West / POBox 838

I have tried below code. But It can remove important information from address and leave PO Box data in address (not to move all PO Box data into address2).

input_array = [
    '780 Main Street, P.O. Box 410',        
    '438 University Ave. P.O. Box 5 ',        
    'HIGHWAY 10 BOX 39',         
    '98 LATHROP ROAD - BOX 147',         
    '396 S MAIN/P.O. BOX 820 ',       
    'HWY 18 AND HWY 128 (BOX 1305)',     
    '808 Innisfil Beach Rd Box 2',       
    '100 St 101 Ave, P.o. Box 1620',       
    '201 Del Rio (p.O. Box 309 ',       
    'BOX 487 2054 HWY 1 EAST ',       
    'P O BOX 2820 41340 BIG BEAR BL',        
    '2813 HWY 15 - P O BOX 1083 ',        
    'P.o. Box 838 2540 Hwy 43 West' 
]


    import re
    for inputs in input_array:
        inputs = (inputs).lower()
        for a in (inputs.split(' ')):
            if 'box' in a:
                box_index = (inputs.split(' ').index(a))
                box_num = ((inputs.split(' ')[(inputs.split(' ').index(a)) + 1]))
                if (((inputs.split(' ')[(inputs.split(' ').index(a)) + 1])).isdigit()):
                    if 'p' in ((inputs.split(' ')[(inputs.split(' ').index(a)) - 1])) or 'o' in ((inputs.split(' ')[(inputs.split(' ').index(a)) - 1])):
                        inputs = inputs.replace(((inputs.split(' ')[(inputs.split(' ').index(a)) - 1])), '')
                    else:
                        inputs = inputs.replace(((inputs.split(' ')[(inputs.split(' ').index(a)) + 1])), '')
                        inputs = inputs.replace(a, '')
                        inputs = inputs.replace('-', '')
                        inputs = inputs.replace('/', '')
                        inputs = inputs.replace(',', '')
                     print ('address => ',inputs,'    address2 => ', 'PO Box ', box_num)
                break

Need Improvement in above function to make it more compatible with desired result.

Upvotes: 2

Views: 755

Answers (1)

Michael Savchenko
Michael Savchenko

Reputation: 1445

Interesting enough question. Here's regex which works for all of your examples, but I can't say for sure if it will work all the way for your project. Read more regex documentation and play with regular expressions here.

Here's code:

import re

streets = [
    '780 Main Street, P.O. Box 410',
    '438 University Ave. P.O. Box 5 ',
    'HIGHWAY 10 BOX 39',
    '98 LATHROP ROAD - BOX 147',
    '396 S MAIN/P.O. BOX 820 ',
    'HWY 18 AND HWY 128 (BOX 1305)',
    '808 Innisfil Beach Rd Box 2',
    '100 St 101 Ave, P.o. Box 1620',
    '201 Del Rio (p.O. Box 309 ',
    'BOX 487 2054 HWY 1 EAST ',
    'P O BOX 2820 41340 BIG BEAR BL',
    '2813 HWY 15 - P O BOX 1083 ',
    'P.o. Box 838 2540 Hwy 43 West'
]

regex = r'([^a-z0-9]*(p[\s.]?o)?[\s.]*?box (\d+)[^a-z0-9]*)'

for street in streets:
    match = re.search(regex, street, flags=re.IGNORECASE)
    po_box_chunk = match.group(0)
    po_box_number = match.group(3)
    cleaned_address = street.strip(po_box_chunk)
    result = '{} / PO Box {}'.format(cleaned_address, po_box_number)

    print(result)

Upvotes: 1

Related Questions