Reputation: 85
I have a table, [Customer], which has [postcode] and [totalValue] fields.
I need to design a query that returns the first two characters of all [postcode], and then a SUM of the [totalValue] for that particular postcode group (so one value each for NW, L1, etc.).
I have this code, which is running fine:
SELECT LEFT(postcode, 2), SUM(totalValue)
FROM Customer
GROUP BY LEFT(postcode, 2)
...however I have entries in [Customer] that don't contain a valid [postcode].
A valid UK postcode is always either one or two letters followed by a number (e.g. LE1, L12, etc.).
I would like to filter out all the incorrect/empty/Null [postcode] records into a separate record entry, but this is beyond my skillset.
Upvotes: 2
Views: 630
Reputation: 69759
Your assumptions about valid postcodes is slightly out
A valid UK postcode is always either one or two letters followed by a number (e.g. LE1, L12, etc.).
In its simplest terms the valid formats for a UK Postcode are:
+----------+---------------------------------------------+----------+
| Format | Coverage | Example |
+----------+---------------------------------------------+----------+
| AA9A 9AA | WC postcode area; EC1–EC4, NW1W, SE1P, SW1 | EC1A 1BB |
+----------+---------------------------------------------+----------+
| A9A 9AA | E1W, N1C, N1P | W1A 0AX |
+----------+---------------------------------------------+----------+
| A9 9AA | B, E, G, L, M, N, S, W | M1 1AE |
| A99 9AA | | B33 8TH |
+----------+---------------------------------------------+----------+
| AA9 9AA | All other postcodes | CR2 6XH |
| AA99 9AA | | DN55 1PT |
+----------+---------------------------------------------+----------+
Each of which you can define a pattern match for:
AA9A 9AA - [A-Z][A-Z][0-9][A-Z] [0-9][A-Z][A-Z]
A9A 9AA - [A-Z][0-9][A-Z] [0-9][A-Z][A-Z]
A9 9AA - [A-Z][0-9] [0-9][A-Z][A-Z]
A99 9AA - [A-Z][0-9][0-9] [0-9][A-Z][A-Z]
AA9 9AA - [A-Z][A-Z][0-9] [0-9][A-Z][A-Z]
AA99 9AA - [A-Z][A-Z][0-9][0-9] [0-9][A-Z][A-Z]
For something as re-usable as this, I think it is worth storing, so I would create a table for it:
CREATE TABLE dbo.SimplePostCodeValidation
(
PostCode VARCHAR(8) NOT NULL,
Pattern VARCHAR(50) NOT NULL
);
INSERT dbo.SimplePostCodeValidation (PostCode, Pattern)
VALUES
('AA9A 9AA', '[A-Z][A-Z][0-9][A-Z] [0-9][A-Z][A-Z]'),
('A9A 9AA', '[A-Z][0-9][A-Z] [0-9][A-Z][A-Z]'),
('A9 9AA', '[A-Z][0-9] [0-9][A-Z][A-Z]'),
('A99 9AA', '[A-Z][0-9][0-9] [0-9][A-Z][A-Z]'),
('AA9 9AA', '[A-Z][A-Z][0-9] [0-9][A-Z][A-Z]'),
('AA99 9AA', '[A-Z][A-Z][0-9][0-9][0-9][A-Z][A-Z]'),
-- REPEAT THE POSTCODES WITHOUT SPACES
('AA9A9AA', '[A-Z][A-Z][0-9][A-Z][0-9][A-Z][A-Z]'),
('A9A9AA', '[A-Z][0-9][A-Z][0-9][A-Z][A-Z]'),
('A99AA', '[A-Z][0-9][0-9][A-Z][A-Z]'),
('A999AA', '[A-Z][0-9][0-9][0-9][A-Z][A-Z]'),
('AA99AA', '[A-Z][A-Z][0-9][0-9][A-Z][A-Z]'),
('AA999AA', '[A-Z][A-Z][0-9][0-9][0-9][A-Z][A-Z]');
Now you can easily validate your postcodes:
DECLARE @T TABLE (Postcode VARCHAR(8));
INSERT @T (PostCode)
SELECT PostCode
FROM dbo.SimplePostCodeValidation
UNION ALL
SELECT PostCode
FROM (VALUES ('123456'), (''), ('TEST')) t (PostCode);
SELECT t.PostCode, IsValid = CASE WHEN pc.PostCode IS NULL THEN 0 ELSE 1 END
FROM @T AS t
LEFT JOIN SimplePostCodeValidation AS pc
ON t.PostCode LIKE pc.Pattern;
Which returns:
PostCode IsValid
----------------------
AA9A 9AA 1
A9A 9AA 1
A9 9AA 1
A99 9AA 1
AA9 9AA 1
AA99 9AA 1
123456 0
0
TEST 0
To apply this to your situation you would use:
SELECT CASE WHEN pc.PostCode IS NULL THEN 'Invalid' ELSE LEFT(c.postcode, 2) END,
TotalValue = SUM(totalValue)
FROM Customer AS c
LEFT JOIN SimplePostCodeValidation AS pc
ON t.PostCode LIKE pc.Pattern;
GROUP BY CASE WHEN pc.PostCode IS NULL THEN 'Invalid' ELSE LEFT(c.postcode, 2) END;
If you want to get more complicated, there are actually further limitations to what is a valid postcode, e.g. if it is the pattern A9 9AA
then the first letter can only be one of (B, E, G, L, M, N, S, W). The guidelines set out on wikipedia state:
Since SQL Server does not support full regex it gets a bit more complicated to account for all these caveats. If you really wanted fool proof validation I would be inclined to use the regex from answers this question, and use a CLR function to validate the postcode.
Upvotes: 1
Reputation: 2733
Query:
WITH CTE AS (SELECT CASE WHEN postcode LIKE('[A-Z][A-Z0-9][0-9]%')
THEN LEFT(postcode,2) ELSE 'Invalid' END AS PostCode
,totalValue
FROM Customer)
SELECT PostCode, SUM(totalValue) as totalValue
FROM CTE
GROUP BY PostCode
Uses a CTE
to gather the data. Invalid postcodes will be grouped into 'Invalid'.
Upvotes: 0
Reputation: 1618
Try this. All incorrect (or NULL) postcodes will be accumulated in '--' entry:
SELECT
CASE
WHEN postcode like '[A-Z][A-Z0-9][0-9]%'
THEN LEFT(postcode, 2)
ELSE '--'
END,
SUM(totalValue)
FROM Customer
GROUP BY
CASE
WHEN postcode like '[A-Z][A-Z0-9][0-9]%'
THEN LEFT(postcode, 2)
ELSE '--'
END
Upvotes: 0
Reputation: 1924
This is the select:
SELECT LEFT(postcode, 2), SUM(totalValue)
FROM Customer
WHERE postcode IS NOT NULL AND postcode != '' AND dbo.fn_isValid(postcode) = 1
GROUP BY LEFT(postcode, 2)
UNION
SELECT 'NULL', SUM(totalValue)
FROM Customer
WHERE postcode IS NULL
GROUP BY LEFT(postcode, 2)
UNION
SELECT 'EMPTY', SUM(totalValue)
FROM Customer
WHERE postcode = ''
GROUP BY LEFT(postcode, 2)
UNION
SELECT 'WRONG', SUM(totalValue)
FROM Customer
WHERE dbo.fn_isValid(postcode) = 0
GROUP BY LEFT(postcode, 2)
Then you need the evaluation function, something like this:
CREATE FUNCTION [dbo].[fn_IsValid]
(
@Code nchar(32)
)
RETURNS bit
AS
BEGIN
DECLARE @Status bit
SET @Status = -- Your validation logic
--be aware of empty and null values
RETURN @Status
END
Upvotes: -2
Reputation: 1269743
You can use LIKE
:
where postcode like '[A-Z][A-Z0-9][0-9]%'
You might also want to check the length and other characteristics, but this answers your specific question.
EDIT:
For a separate entry, use case
:
SELECT (CASE WHEN postcode like '[A-Z][A-Z0-9][0-9]%'
THEN LEFT(postcode, 2)
ELSE 'Separate Entry'
END) as PostCode2, SUM(totalValue)
FROM Customer
GROUP BY (CASE WHEN postcode like '[A-Z][A-Z0-9][0-9]%'
THEN LEFT(postcode, 2)
ELSE 'Separate Entry'
END);
Upvotes: 5
Reputation: 3499
Try this..
SELECT LEFT(postcode, 2), SUM(totalValue)
FROM Customer
WHERE ISNUMERIC(LEFT(postcode, 1))<> 1
GROUP BY LEFT(postcode, 2)
Hope this helps.
Upvotes: -1