Reputation: 10303
I'm trying to avoid reinventing the wheel when it comes to storing street addresses in a table only once. Uniqueness constraints won't work in some common situations:
100 W 5th Ave
100 West 5th Ave
100 W 5th
200 N 6th Ave Suite 405
200 N 6th Ave #405
I could implement some business logic or a trigger to normalize all fields before inserting and use uniqueness constraints across several fields in the table, but it would be easy to miss some cases with something that varies as much as street addresses.
What would be best would be a universal identifier for each address, perhaps based on GPS coordinates. Before storing a new address look up its GUID and see if the GUID already exists in the Address table.
An organization like Mapquest, the Postal Serice, FedEx, or the US government probably has a system like this.
Has anyone found a good solution to this?
Here's my Address table now (generated by JPA):
CREATE TABLE address
(
id bigint NOT NULL,
"number" character varying(255),
dir character varying(255),
street character varying(255),
"type" character varying(255),
trailingdir character varying(255),
unit character varying(255),
city character varying(255),
state character varying(255),
zip integer,
zip4 integer,
CONSTRAINT address_pkey PRIMARY KEY (id)
)
Upvotes: 5
Views: 1822
Reputation: 10303
I settled on the USC WebGIS service due to their nice web service interface and being easy to sign up for.
Geocodes aren't suitable as a unique key for street addresses, though, for a number of reasons. For example, geocoding cannot distinguish between different units in a condominium complex or apartment building.
I decided to use the parsed address from the geocoding result and put unique constraints on the street number, street name, unit, city, state, and zip. It's not perfect, but it works for what I'm doing.
Upvotes: 0
Reputation: 10303
I wasn't looking for address validation or normalization, although address validation is a good idea. I need a unique identifier for each street address to avoid duplicate records.
It looks like geocoding can provide a solution. With geocoding the input can be a street address and the output will be latitude and longitude coordinates with enough precision to resolve a specific building.
There's a more serious problem with street address ambiguity than I thought. This is from the Wikipedia page on geocoding:
"...there are multiple 100 Washington Streets in Boston, Massachusetts because several cities have been annexed without changing street names."
The Wikipedia page on geocoding has a list of resources (many free) to perform geocoding.
Upvotes: 0
Reputation: 89
You need support for regular expressions like syntax. You can come up with some kind of automata function that will parse tokens and try matching them and then expand or contract them into abbreviations. I'd look into glob() like functions that give support to *? etc on unix as a quick dirty fix.
Upvotes: 0
Reputation: 36977
Look up the address in Google maps and use the spelling they use.
Upvotes: 4