Harrison
Harrison

Reputation: 5396

Rearrange position of words in string conditionally

I've spent the last few months developing a program that my company is using to clean and geocode addresses on a large scale (~5,000/day). It is functioning adequately well, however, there are certain address formats that I see daily that are causing issues for me.

Addresses with a format such as this park avenue 1 are causing issues with my geocoding. My thought process to tackle this issue is as follows:

  1. Split the address into a list
  2. Find the index of my delimiter word in the list. The delimiter words are words such as avenue, street, road, etc. I have a list of these delimiters called patterns.
  3. Check to see if the word immediately following the delimiter is composed of digits with a length of 4 or less. If the number has a length of higher than 4 it is likely to be a zip code, which I do not need. If it's less than 4 it will most likely be the house number.
  4. If the word meets the criteria that I explained in the previous step, I need to move it to the first position in the list.
  5. Finally, I will put the list back together into a string.

Here is my initial attempt at putting my thoughts into code:

patterns ['my list of delimiters']
address = 'park avenue 1'    # this is an example address
address = address.split(' ')
for pattern in patterns:
    location = address.index(pattern) + 1
    if address[location].isdigit() and len(address[location]) <= 4:
        # here is where i'm getting a bit confused
        # what would be a good way to go about moving the word to the first position in the list
address = ' '.join(address)

Any help would be appreciated. Thank you folks in advance.

Upvotes: 0

Views: 2271

Answers (2)

PM 2Ring
PM 2Ring

Reputation: 55479

Here's a modified version of your code. It uses simple list slicing to rearrange the parts of the address list.

Rather than using a for loop to search for a matching road type it uses set operations.

This code isn't perfect: it won't catch "numbers" like 12a, and it won't handle weird street names like "Avenue Road".

road_patterns = {'avenue', 'street', 'road', 'lane'}

def fix_address(address):
    address_list = address.split()
    road = road_patterns.intersection(address_list)
    if len(road) == 0:
        print("Can't find a road pattern in ", address_list)
    elif len(road) > 1:
        print("Ambiguous road pattern in ", address_list, road)
    else:
        road = road.pop()
        index = address_list.index(road) + 1
        if index < len(address_list):
            number = address_list[index]
            if number.isdigit() and len(number) <= 4:
                address_list = [number] + address_list[:index] + address_list[index + 1:]
                address = ' '.join(address_list)
    return address

addresses = (
    '42 tobacco road',
    'park avenue 1 a',
    'penny lane 17',
    'nonum road 12345',
    'strange street 23 london',
    'baker street 221b',
    '37 gasoline alley',
    '83 avenue road',
)

for address in addresses:
    fixed = fix_address(address)
    print('{!r} -> {!r}'.format(address, fixed))

output

'42 tobacco road' -> '42 tobacco road'
'park avenue 1 a' -> '1 park avenue a'
'penny lane 17' -> '17 penny lane'
'nonum road 12345' -> 'nonum road 12345'
'strange street 23 london' -> '23 strange street london'
'baker street 221b' -> 'baker street 221b'
Can't find a road pattern in  ['37', 'gasoline', 'alley']
'37 gasoline alley' -> '37 gasoline alley'
Ambiguous road pattern in  ['83', 'avenue', 'road'] {'avenue', 'road'}
'83 avenue road' -> '83 avenue road'

Upvotes: 0

Alicia Garcia-Raboso
Alicia Garcia-Raboso

Reputation: 13913

Make the string address[location] into a list by wrapping it in brackets, then concatenate the other pieces.

address = [address[location]] + address[:location] + address[location+1:]

An example:

address = ['park', 'avenue', '1']
location = 2
address = [address[location]] + address[:location] + address[location+1:]

print(' '.join(address)) # => '1 park avenue'

Upvotes: 1

Related Questions