Reputation: 28169
I have a database that contains 5 digit zip codes (ie 10001) and matching state abbreviations (ie NJ, NY, CA). I've found that some of the zip codes have multiple states (ie 10001 = NJ and 10001 = NY) which is wrong.
zip State
10001 NY
10001 NJ
10001 NY
10001 NY
... ...
Each State can have many zip codes, but each zip code should have only one state.
I'd like to find all the errors but can't seem to write a query to do so.
Any suggestions?
Upvotes: 4
Views: 16471
Reputation: 51
I actually work with nationwide datasets on a daily basis and encounter this issue a lot. The State designator in the prefix of the ZIP code indicates the state that the Post Office is located in, not necessarily the extents of the delivery area. I was in contact with some higher-ups at USPS about some issues in the north-central part of the country and was told that the ZIP code program originally intended for the ZIP Codes to be confined by state boundaries, but in the early 80s they started to make exceptions in rural areas. There are cases where a house in North Dakota is 10 miles away from a Montana Post Office, but the nearest Post Office in their own state is located several counties away. This is why these exceptions are made. It makes sense on the ground level, but not on a data level. There are a lot of these exceptions in the USPS databases now. The most prolific areas (that I have found) are along the MT/ND and SD/ND borders.
Upvotes: 5
Reputation: 16035
Just to point out the obvious:
Because ZIP codes are intended for efficient postal delivery, there are unusual cases where a ZIP code crosses state boundaries, such as a military facility spanning multiple states or remote areas of one state most easily serviced from an adjacent state. For example ZIP code 42223 spans Christian KY and Montgomery TN, and ZIP code 97635 spans Lake OR and Modoc CA.
Be careful what you consider canonical data, and always trust someone providing you authentic data.
In this case, 0 is for NJ, so 10001
for NJ would be wrong, but 00001
would be accurate for NJ, and 1 is for NY, so 00001
would be wrong for NY, but 10001
would be accurate for NY. See also http://en.wikipedia.org/wiki/List_of_ZIP_code_prefixes
Also of note is that with the 1000 zip coverages in the previous link, you could accurately determine what zip codes you have that fall outside the range/state that they should be in ...
Upvotes: 20
Reputation: 280431
A different approach, but rather than just give you a count, this gives you the states involved.
SELECT zip, state
FROM dbo.table AS t
WHERE EXISTS
(
SELECT 1 FROM dbo.table
WHERE zip = t.zip AND state <> t.state
)
GROUP BY zip, state
ORDER BY zip, state;
Once you've identified the duplicates and removed them, add a unique constraint on zip,state
so you're not doing this again next week, next month, etc.
Upvotes: 8
Reputation: 10184
This should give you what you need:
select zip,count(distinct state)
from TheTable
group by zip
having count(distinct state)>1
That will give you a list of each zip code for which more than one state exists in the table.
Upvotes: 1