nsilva
nsilva

Reputation: 5612

Removing duplicated postcode entries

Basically I've extracted all the streets for London which has over 500000+ records. The database is using SQL Server 2008. For some streets it has put them in correctly such as 'ABBEY TERRACE' and 'ABBEY VIEW', but for others like 'ABBEY STREET' it has got many duplicated entries of similiar postcodes.

What I would like to do is keep the first 'ABBEY STREET' (SE1 2AN) and 'ABBEY STREET' (SE1 3BU) but remove the other entries. So basically it is looking at the '5th' character and removing the duplicates.

I indexed this so that it removed all of them for the '6th' character which works for postcodes that start with W1 1AA, but not ones that start with W11 1AA.

Is there any way I can remove the duplicats via SQL for postcodes that start with three characters?

Table : Locals

Index       Street          PC        
371582   ABBEY STREET     SE1 2AN
371583   ABBEY STREET     SE1 2DP
371584   ABBEY STREET     SE1 3BU
371585   ABBEY STREET     SE1 3DW
371586   ABBEY STREET     SE1 3ED
371588   ABBEY STREET     SE1 3NJ
371589   ABBEY TERRACE    SE2 9EY
371590   ABBEY VIEW       NW7 4PB

Upvotes: 1

Views: 163

Answers (2)

Devart
Devart

Reputation: 121952

Try this one -

Query:

DECLARE @temp TABLE
(
      [Index] INT
    , Street VARCHAR(30)
    , PC VARCHAR(10)
)

INSERT INTO @temp ([Index], Street, PC)
VALUES   
    (371582, 'ABBEY STREET', 'SE1 2AN'),
    (371583, 'ABBEY STREET', 'SE1 2DP'),
    (371584, 'ABBEY STREET', 'SE1 3BU'),
    (371585, 'ABBEY STREET', 'SE1 3DW'),
    (371586, 'ABBEY STREET', 'SE1 3ED'),
    (371588, 'ABBEY STREET', 'SE1 3NJ'),
    (371589, 'ABBEY TERRACE', 'SE2 9EY'),
    (371590, 'ABBEY VIEW', 'NW7 4PB')

SELECT t.[Index], t.Street, t.PC
FROM (
    SELECT 
          *
        , rn = ROW_NUMBER() OVER (
                    PARTITION BY Street, CAST(PC AS CHAR(5)) 
                    ORDER BY CAST(PC AS CHAR(5)) 
            )
    FROM @temp
) t
WHERE rn = 1

Results:

Index       Street                         PC
----------- ------------------------------ ----------
371582      ABBEY STREET                   SE1 2AN
371584      ABBEY STREET                   SE1 3BU
371589      ABBEY TERRACE                  SE2 9EY
371590      ABBEY VIEW                     NW7 4PB

DELETE statement:

DELETE FROM t
FROM (
    SELECT 
          *
        , rn = ROW_NUMBER() OVER (
                    PARTITION BY Street, CAST(PC AS CHAR(5)) 
                    ORDER BY CAST(PC AS CHAR(5)) 
            )
    FROM <your_table>
) t 
WHERE rn > 1

SELECT * 
FROM <your_table>

Upvotes: 2

GarethD
GarethD

Reputation: 69789

The following are the allowed formats for a UK Postcode (where A is any letter and 9 is any number):

Format      Example
AA9A 9AA    EC1A 1BB
A9A 9AA     W1A 1HQ
A9 9AA      M1 1AA
A99 9AA     B33 8TH
AA9 9AA     CR2 6XH
AA99 9AA    DN55 1PT

Since all variations of UK Postcode end in the same format (Number Letter Letter), you could extract the postcode using:

SUBSTRING(PC, 1, LEN(PC) -2)

i.e. Just remove the last two letters from the postcode. You can then partition your data by this partial postcode:

This method would get the partial columns below from the examples:

Format      Example     Partial
AA9A 9AA    EC1A 1BB    EC1A 1
A9A 9AA     W1A 1HQ     W1A 1
A9 9AA      M1 1AA      M1 1
A99 9AA     B33 8TH     B33 8
AA9 9AA     CR2 6XH     CR2 6
AA99 9AA    DN55 1PT    DN55 1


SELECT  *,
        RN = ROW_NUMBER() OVER(PARTITION BY Street, SUBSTRING(PC, 1, LEN(PC) - 2) ORDER BY PC)
FROM    Locals;

Then it is just a case of deleting those rows that aren't the first:

WITH CTE AS
(   SELECT  *,
            RN = ROW_NUMBER() OVER(PARTITION BY Street, SUBSTRING(PC, 1, LEN(PC) - 2) ORDER BY PC)
    FROM    Locals
)
DELETE  CTE
WHERE   RN > 1;

Example on SQL Fiddle

Note, you may need to change the ORDER BY within the ROW_NUMBER() function depending on your exact requirements

Upvotes: 2

Related Questions