Reputation: 87087

How to best design address locations in any SQL Database?

Overview

I'm working on some Emergency Services reporting and mapping application for California (kind of weird, considering the fires there, right now...). We need to map demographic and emergency data for an internal govt unit.

What we have are all the streets, cities and neighborhoods in California. Each neighborhood also has it's relevant shapefile (lat long that defines it's boundaries). This was given to us by the US Census board (all public domain stuff) website.

Problem

I'm not sure how to best design the DB tables. We haven't been told what type of DB we need to use .. so we're open to suggestions if that helps. We have experience with MS SQL 2005 and 2008 (and the spatial stuff in '08).

We can have the following legit data.

Street, City, State
City, State
Neighborhood, State
State

The reason why State is a legit location is because we're told this might be sold to other states, so we need to plan for that now.

So, originally, i thought of this...

LocationId INTEGER PK Identity
Street NVARCHAR(100)
Neighbourhood NVARCHAR(100)
City NVARCHAR(100)
State NVARCHAR(100)
Latitude VARCHAR(15)
Longitude VARCHAR(15)
Shapefile

None of those are nullable, btw. But after a short while, i thought that it was a waste to have so many 'California' text or 'San Diego' text in the fields. So i changed the table to be more normalised by making the Neighborhood, City and State fields a foreign key to their own new table (eg. lookups) .. and those two fields are now NULLABLE.

So .. that all works fine. except when i try and do some Sql statements on them. Because of the NULLABLE FK's, it's a nightmare to make all these outer join queries :(

What about having the main table, the sub-lookup tables (eg. Neighbourhoods, Cities and States) linked via ID's and then place all this in a view? Remember, NeighborhoodID and CitiyID would be NULLABLE.. ???

I just want to see people's thoughts on this and the reasons they made their suggestions, please. I'm really worried and confused but are eager to learn.

Please help!

edit 1: I need to stick to an RDBMS Database.

edit 2: I'm thinking about going a single table (de-normalized) with constraints to keep the sum of the fields unqiue OR multi-tables with nullable FK's on the main table (eg. Locations (main table), Neighborhoods, Cities, States ... normalized db schema).

edit 3: Added City to the sample, second list.

edit 4: Added view question.

Upvotes: 6

Answers (5)

Richard Harrison

Reputation: 19403

Taking the example:

Street, City, State
City, State
Neighborhood, State
State

Firstly go back to basic principles, all of the above are distinct geospatial entities, so your address is composed of a name, and one or many geospatial specifiers. This tells us that we really should be storing them in a single table. The key here is to think of the data more abstractly,

So your address table needs a 1-many relationship to another table, called address_entities which is as follows:

int ID
varchar() name
varchar() type
int parentID
geography position.
int parentID

This means that you will obviously need a table to link the address to the address entity table above. Now, each geospatial entity is inherently hierarchical, and whilst it makes the SQL harder, and personally I try to avoid self referencing tables there are times when it is a good solution and this is one of them.

The benefits are huge, even though it makes the code harder, it is worth it in the long run.

Also, even when it isn't an immediate requirement, think globally, not all addresses in the world have a street, or state, for example,in france a valid address could be

- la Maison des Fou
- 24500 Eymet

So, bear that in mind when designing schemas.

Upvotes: 4

Jonathan Leffler

Reputation: 755114

As @Oddthinking noted in a comment, your problems started at:

So I changed the table to be more normalised by making the Neighborhood, City and State fields a foreign key to their own new table (eg. lookups) .. and those two fields are now NULLABLE.

So .. that all works fine. except when I try and do some SQL statements on them. Because of the NULLABLE FK's, it's a nightmare to make all these outer join queries.

This reminds me of the "Doctor, doctor, it hurts when I hit myself like this" joke.

Why exactly did you make the foreign key fields nullable? They were mandatory before, so you should keep them as mandatory, precisely to avoid the nightmares of outer join queries.

Your explanation (question) is somewhat confusing in that you list three fields (Neighborhood, City and State) and then say "those two fields are now nullable". Which two are? And why? And what is in the lookup table? Or is there more than one lookup table? There might be an argument for some sort of NeighbourhoodID number which is a foreign key to a Neighbourhood table, which defines the City and State as well as Neighbourhood name. You might then decide that there is a closed list of cities and the cities have an ID number too, and that number determines the state too. You are probably as well off using a two-letter state code as creating a (probably 4-byte) state ID number. However, do not forget that a check criterion that ensures that the state code is one of the 50 or so valid state codes is harder to write than a foreign key that references a table of states. Since neither states nor cities changes very often, I'd probably use the table of states with a foreign key - but the key column would be the state code.

That means you might have a table of Neighbourhoods with columns NeighbourhoodID, Name, CityID; a table of Cities with columns CityID, Name, State; and a table of States with columns State and Name. You can add other columns as you see fit. And your primary table would contain a NeighbourhoodID column that is a foreign key to Neighbourhoods table.

Upvotes: 2

Cade Roux

Reputation: 89741

Is this an OLTP system and reporting system or only a reporting system? If it's only a reporting system, you can denormalize the data in a data warehouse fashion (with snowflake dimensions or not for the hierachies of geographic jurisdictions) and you'll find the reporting to be easier.

I would start from the results and work back, because it sounds to me like you are getting fed the data and you are trying to bring it into a database to support the reporting and mapping. In this case, the database schema being a traditional normalized system is not important because redundancy in the data is not something that will cause maintenance problems for users, etc.

If this seems appropriate, you want to look into the Kimball books.

Upvotes: 0

Dana the Sane

Reputation: 15208

This a problem I've had to deal with and RDBMS systems aren't the best at storing hierarchical data. You might want to look at using an object database since these have to deal with nested objects and are optimized for the problem.

If you need to use an RDBMS, you may have to stick with a de-normalized schema though. Having separate tables to maintain your cities, streets etc may be handy for tracking changes though. If a city or street needs to be renamed, you can update the master record in the respective table and schedule a job to update a text copy of the string in your 'main' table. This will prevent you from having to run updates on 10's 100's of thousands of rows during prime time, but still lets you store the most up-to-data data in the db. Of course, this makes the data duplication situation even worse, but it's the price to pay for performance.

Upvotes: 1

TAG

Reputation: 1272

This is a nice place to start. A whole #$(#$-load of database schemas to check out:

http://www.databaseanswers.org/data_models/

Upvotes: 1

How to best design address locations in any SQL Database?

Answers (5)

Related Questions