Reputation: 5396
I've spent the last few months developing a program that my company is using to clean and geocode addresses on a large scale (~5,000/day). It is functioning adequately well, however, there are certain address formats that I see daily that are causing issues for me.
Addresses with a format such as this park avenue 1
are causing issues with my geocoding. My thought process to tackle this issue is as follows:
avenue, street, road, etc
. I have a list of these delimiters called patterns
.Here is my initial attempt at putting my thoughts into code:
patterns ['my list of delimiters']
address = 'park avenue 1' # this is an example address
address = address.split(' ')
for pattern in patterns:
location = address.index(pattern) + 1
if address[location].isdigit() and len(address[location]) <= 4:
# here is where i'm getting a bit confused
# what would be a good way to go about moving the word to the first position in the list
address = ' '.join(address)
Any help would be appreciated. Thank you folks in advance.
Upvotes: 0
Views: 2271
Reputation: 55479
Here's a modified version of your code. It uses simple list slicing to rearrange the parts of the address list.
Rather than using a for
loop to search for a matching road type it uses set operations.
This code isn't perfect: it won't catch "numbers" like 12a, and it won't handle weird street names like "Avenue Road".
road_patterns = {'avenue', 'street', 'road', 'lane'}
def fix_address(address):
address_list = address.split()
road = road_patterns.intersection(address_list)
if len(road) == 0:
print("Can't find a road pattern in ", address_list)
elif len(road) > 1:
print("Ambiguous road pattern in ", address_list, road)
else:
road = road.pop()
index = address_list.index(road) + 1
if index < len(address_list):
number = address_list[index]
if number.isdigit() and len(number) <= 4:
address_list = [number] + address_list[:index] + address_list[index + 1:]
address = ' '.join(address_list)
return address
addresses = (
'42 tobacco road',
'park avenue 1 a',
'penny lane 17',
'nonum road 12345',
'strange street 23 london',
'baker street 221b',
'37 gasoline alley',
'83 avenue road',
)
for address in addresses:
fixed = fix_address(address)
print('{!r} -> {!r}'.format(address, fixed))
output
'42 tobacco road' -> '42 tobacco road'
'park avenue 1 a' -> '1 park avenue a'
'penny lane 17' -> '17 penny lane'
'nonum road 12345' -> 'nonum road 12345'
'strange street 23 london' -> '23 strange street london'
'baker street 221b' -> 'baker street 221b'
Can't find a road pattern in ['37', 'gasoline', 'alley']
'37 gasoline alley' -> '37 gasoline alley'
Ambiguous road pattern in ['83', 'avenue', 'road'] {'avenue', 'road'}
'83 avenue road' -> '83 avenue road'
Upvotes: 0
Reputation: 13913
Make the string address[location]
into a list by wrapping it in brackets, then concatenate the other pieces.
address = [address[location]] + address[:location] + address[location+1:]
An example:
address = ['park', 'avenue', '1']
location = 2
address = [address[location]] + address[:location] + address[location+1:]
print(' '.join(address)) # => '1 park avenue'
Upvotes: 1