Reputation: 39593
I'm designing a web app (using Google Maps) that will allow users to search for residential postal addresses in my database.
That is, users will provide addresses and I'll store them; later, other users will type in an address to see if that address is in my database.
But addresses are notoriously hard to normalize; I can't figure out how best to store/query them. (Especially since Google's Geocoder doesn't let me store the results of the geocoder.)
What's the best approach?
Upvotes: 3
Views: 1021
Reputation: 5412
This is a problem that can be solved both by lat-long (use R-trees for quick 2-D closest neighbours! Comes as standard in MongoDB, but certainly availiable i Psql among others as well)
There's also the text matching, described here: SO: What are ways to match street addresses in SQL Server?
There seems to be third party products availiable as well: SO: I need an address matching algorithm
If you want to combine these two approaches, look for the term "data fusion", which is a quite disparate collection of methods that essentially put higher weight to answers that are more certain, and bases the final answer on the aggregated certainty.
A description of some Harward Design GIS-project research could be of interest as well: http://www.gsd.harvard.edu/gis/manual/geocoding/
There's a list of all the cities in the world with their corresponding coordinates: http://www.maxmind.com/en/worldcities
Upvotes: 1
Reputation: 481
You could perhaps use geocoder.us to supplement or replace your use of Google's geocoder. It does a nice job of parsing out the address components; that might help with normalization. There's also a newer version that might be worth looking at to see how it works.
Upvotes: 0
Reputation: 39593
Here's what I'd considered:
1) Geocode the address on input, store the lat/long. When the user does a search, geocode the address and compare lat/longs to see if I have that exact lat/long in my database.
But there are problems with this.
2) Geocode the address on input, but don't store the lat/long; store the address components, and compare those.
This seems better, but there are still problems:
3) Geocode the address, store the lat/long, but don't search for the lat/long exactly. Search within a small radius around the resulting point, looking for possible matches. Compare those possible matches by address components.
This might be the best answer, except that it still violates Google's Geocoder terms of use.
4) Geocode the address on input, get the address components, but just use them to store a parsed normalized postal address in the database.
Add some hand-rolled code to split normalized addresses into even smaller fields (street name, street type, prefix, postfix ...) When the user runs the search, run the same normalization code, then search by fields.
I guess this would work, but rolling my own address parser seems like a recipe for pain. It seems like it just can't possibly be right. (I can't be the first person to need to solve this problem, can I?)
Upvotes: 1