Reputation: 5612
Basically I've extracted all the streets for London which has over 500000+ records. The database is using SQL Server 2008. For some streets it has put them in correctly such as 'ABBEY TERRACE' and 'ABBEY VIEW', but for others like 'ABBEY STREET' it has got many duplicated entries of similiar postcodes.
What I would like to do is keep the first 'ABBEY STREET' (SE1 2AN) and 'ABBEY STREET' (SE1 3BU) but remove the other entries. So basically it is looking at the '5th' character and removing the duplicates.
I indexed this so that it removed all of them for the '6th' character which works for postcodes that start with W1 1AA, but not ones that start with W11 1AA.
Is there any way I can remove the duplicats via SQL for postcodes that start with three characters?
Table : Locals
Index Street PC
371582 ABBEY STREET SE1 2AN
371583 ABBEY STREET SE1 2DP
371584 ABBEY STREET SE1 3BU
371585 ABBEY STREET SE1 3DW
371586 ABBEY STREET SE1 3ED
371588 ABBEY STREET SE1 3NJ
371589 ABBEY TERRACE SE2 9EY
371590 ABBEY VIEW NW7 4PB
Upvotes: 1
Views: 163
Reputation: 121952
Try this one -
Query:
DECLARE @temp TABLE
(
[Index] INT
, Street VARCHAR(30)
, PC VARCHAR(10)
)
INSERT INTO @temp ([Index], Street, PC)
VALUES
(371582, 'ABBEY STREET', 'SE1 2AN'),
(371583, 'ABBEY STREET', 'SE1 2DP'),
(371584, 'ABBEY STREET', 'SE1 3BU'),
(371585, 'ABBEY STREET', 'SE1 3DW'),
(371586, 'ABBEY STREET', 'SE1 3ED'),
(371588, 'ABBEY STREET', 'SE1 3NJ'),
(371589, 'ABBEY TERRACE', 'SE2 9EY'),
(371590, 'ABBEY VIEW', 'NW7 4PB')
SELECT t.[Index], t.Street, t.PC
FROM (
SELECT
*
, rn = ROW_NUMBER() OVER (
PARTITION BY Street, CAST(PC AS CHAR(5))
ORDER BY CAST(PC AS CHAR(5))
)
FROM @temp
) t
WHERE rn = 1
Results:
Index Street PC
----------- ------------------------------ ----------
371582 ABBEY STREET SE1 2AN
371584 ABBEY STREET SE1 3BU
371589 ABBEY TERRACE SE2 9EY
371590 ABBEY VIEW NW7 4PB
DELETE statement:
DELETE FROM t
FROM (
SELECT
*
, rn = ROW_NUMBER() OVER (
PARTITION BY Street, CAST(PC AS CHAR(5))
ORDER BY CAST(PC AS CHAR(5))
)
FROM <your_table>
) t
WHERE rn > 1
SELECT *
FROM <your_table>
Upvotes: 2
Reputation: 69789
The following are the allowed formats for a UK Postcode (where A is any letter and 9 is any number):
Format Example
AA9A 9AA EC1A 1BB
A9A 9AA W1A 1HQ
A9 9AA M1 1AA
A99 9AA B33 8TH
AA9 9AA CR2 6XH
AA99 9AA DN55 1PT
Since all variations of UK Postcode end in the same format (Number Letter Letter), you could extract the postcode using:
SUBSTRING(PC, 1, LEN(PC) -2)
i.e. Just remove the last two letters from the postcode. You can then partition your data by this partial postcode:
This method would get the partial columns below from the examples:
Format Example Partial
AA9A 9AA EC1A 1BB EC1A 1
A9A 9AA W1A 1HQ W1A 1
A9 9AA M1 1AA M1 1
A99 9AA B33 8TH B33 8
AA9 9AA CR2 6XH CR2 6
AA99 9AA DN55 1PT DN55 1
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY Street, SUBSTRING(PC, 1, LEN(PC) - 2) ORDER BY PC)
FROM Locals;
Then it is just a case of deleting those rows that aren't the first:
WITH CTE AS
( SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY Street, SUBSTRING(PC, 1, LEN(PC) - 2) ORDER BY PC)
FROM Locals
)
DELETE CTE
WHERE RN > 1;
Note, you may need to change the ORDER BY
within the ROW_NUMBER()
function depending on your exact requirements
Upvotes: 2