Returning both a duplicate count and distinct rows

Question

I'm writing a SQL statement to be used as a PL/SQL cursor, that checks a SUBURB field against a table of POSTCODES provided by the Post Office. The vast majority of suburbs will have one and only one postcode, but:

Some suburbs are misspelled, and do not correspond to a postcode;
Some suburbs have multiple postcodes

To avoid hitting the database any more than necessary, and to avoid having duplicate addresses in my results, I want to return a POSTCODE value if there is one postcode that corresponds to the suburb, null otherwise, and a CODES_IN_SUBURB count that's null if the suburb is not found, but otherwise has the count of how many different postcodes there are for that suburb.

I think I'm close, but I'm having some trouble working out how to modify this statement for the CODES_IN_SUBURB half without repeating myself. If I join without the count = 1 WHERE clause, I get duplicate addresses. If I leave it in, I can't use the CODES_IN_SUBURB the way I want. What's the best way to go about this?

SELECT  dirty_data.street1,
        dirty_data.suburb, 
        nsw_postcodes.postcode
FROM dirty_data 
LEFT JOIN ( SELECT postcodes.locality, postcode, dupes.codes_in_suburb 
            FROM postcodes
            INNER JOIN (SELECT locality, count(*) codes_in_suburb 
                        FROM postcodes 
                        WHERE state = 'NSW' 
                        GROUP BY locality) dupes
            ON dupes.locality = postcodes.locality
            WHERE state = 'NSW' AND dupes.codes_in_suburb = 1) nsw_postcodes
ON nsw_postcodes.locality = dirty_data.suburb;

Gordon Linoff · Accepted Answer

How about a simple aggregation, with some conditional clauses?

SELECT locality,
       (CASE WHEN COUNT(*) = 1 THEN MAX(postcode)
        END) as postcode,
       COUNT(*) as codes_in_suburb 
FROM postcodes 
WHERE state = 'NSW' 
GROUP BY locality;

If there is only one post code in the locality, then MAX(postcode) will return that post code. Otherwise (0 or more than 1) the case will return NULL.

Returning both a duplicate count and distinct rows

Answers (1)

Related Questions