Matthew
Matthew

Reputation: 85

Struggling with SQL Where filter

I have a table, [Customer], which has [postcode] and [totalValue] fields.

I need to design a query that returns the first two characters of all [postcode], and then a SUM of the [totalValue] for that particular postcode group (so one value each for NW, L1, etc.).

I have this code, which is running fine:

SELECT LEFT(postcode, 2), SUM(totalValue)
  FROM Customer
  GROUP BY LEFT(postcode, 2)

...however I have entries in [Customer] that don't contain a valid [postcode].

A valid UK postcode is always either one or two letters followed by a number (e.g. LE1, L12, etc.).

I would like to filter out all the incorrect/empty/Null [postcode] records into a separate record entry, but this is beyond my skillset.

Upvotes: 2

Views: 630

Answers (6)

GarethD
GarethD

Reputation: 69759

Your assumptions about valid postcodes is slightly out

A valid UK postcode is always either one or two letters followed by a number (e.g. LE1, L12, etc.).

In its simplest terms the valid formats for a UK Postcode are:

+----------+---------------------------------------------+----------+
| Format   |               Coverage                      | Example  |
+----------+---------------------------------------------+----------+
| AA9A 9AA | WC postcode area; EC1–EC4, NW1W, SE1P, SW1  | EC1A 1BB |
+----------+---------------------------------------------+----------+
| A9A 9AA  | E1W, N1C, N1P                               | W1A 0AX  |
+----------+---------------------------------------------+----------+
| A9 9AA   | B, E, G, L, M, N, S, W                      | M1 1AE   |
| A99 9AA  |                                             | B33 8TH  |
+----------+---------------------------------------------+----------+
| AA9 9AA  |    All other postcodes                      | CR2 6XH  |
| AA99 9AA |                                             | DN55 1PT |
+----------+---------------------------------------------+----------+

Each of which you can define a pattern match for:

AA9A 9AA - [A-Z][A-Z][0-9][A-Z] [0-9][A-Z][A-Z]
A9A 9AA  - [A-Z][0-9][A-Z] [0-9][A-Z][A-Z]
A9 9AA   - [A-Z][0-9] [0-9][A-Z][A-Z]
A99 9AA  - [A-Z][0-9][0-9] [0-9][A-Z][A-Z]
AA9 9AA  - [A-Z][A-Z][0-9] [0-9][A-Z][A-Z]
AA99 9AA - [A-Z][A-Z][0-9][0-9] [0-9][A-Z][A-Z]

For something as re-usable as this, I think it is worth storing, so I would create a table for it:

CREATE TABLE dbo.SimplePostCodeValidation
(
    PostCode VARCHAR(8) NOT NULL,
    Pattern VARCHAR(50) NOT NULL
);
INSERT dbo.SimplePostCodeValidation (PostCode, Pattern)
VALUES
    ('AA9A 9AA', '[A-Z][A-Z][0-9][A-Z] [0-9][A-Z][A-Z]'),
    ('A9A 9AA', '[A-Z][0-9][A-Z] [0-9][A-Z][A-Z]'),
    ('A9 9AA', '[A-Z][0-9] [0-9][A-Z][A-Z]'),
    ('A99 9AA', '[A-Z][0-9][0-9] [0-9][A-Z][A-Z]'),
    ('AA9 9AA', '[A-Z][A-Z][0-9] [0-9][A-Z][A-Z]'),
    ('AA99 9AA', '[A-Z][A-Z][0-9][0-9][0-9][A-Z][A-Z]'),
    -- REPEAT THE POSTCODES WITHOUT SPACES
    ('AA9A9AA', '[A-Z][A-Z][0-9][A-Z][0-9][A-Z][A-Z]'),
    ('A9A9AA', '[A-Z][0-9][A-Z][0-9][A-Z][A-Z]'),
    ('A99AA', '[A-Z][0-9][0-9][A-Z][A-Z]'),
    ('A999AA', '[A-Z][0-9][0-9][0-9][A-Z][A-Z]'),
    ('AA99AA', '[A-Z][A-Z][0-9][0-9][A-Z][A-Z]'),
    ('AA999AA', '[A-Z][A-Z][0-9][0-9][0-9][A-Z][A-Z]');

Now you can easily validate your postcodes:

DECLARE @T TABLE (Postcode VARCHAR(8));
INSERT @T (PostCode)
SELECT PostCode
FROM dbo.SimplePostCodeValidation
UNION ALL
SELECT  PostCode
FROM    (VALUES ('123456'), (''), ('TEST')) t (PostCode);

SELECT  t.PostCode, IsValid = CASE WHEN pc.PostCode IS NULL THEN 0 ELSE 1 END
FROM    @T AS t
        LEFT JOIN SimplePostCodeValidation AS pc
            ON t.PostCode LIKE pc.Pattern;

Which returns:

PostCode    IsValid
----------------------
AA9A 9AA    1
A9A 9AA     1
A9 9AA      1
A99 9AA     1
AA9 9AA     1
AA99 9AA    1
123456      0
            0
TEST        0

To apply this to your situation you would use:

SELECT  CASE WHEN pc.PostCode IS NULL THEN 'Invalid' ELSE LEFT(c.postcode, 2) END, 
        TotalValue = SUM(totalValue)
FROM    Customer AS c
        LEFT JOIN SimplePostCodeValidation AS pc
            ON t.PostCode LIKE pc.Pattern;
GROUP BY CASE WHEN pc.PostCode IS NULL THEN 'Invalid' ELSE LEFT(c.postcode, 2) END;

If you want to get more complicated, there are actually further limitations to what is a valid postcode, e.g. if it is the pattern A9 9AA then the first letter can only be one of (B, E, G, L, M, N, S, W). The guidelines set out on wikipedia state:

  • Areas with only single-digit districts: BR, FY, HA, HD, HG, HR, HS, HX, JE, LD, SM, SR, WC, WN, ZE (although WC is always subdivided by a further letter, e.g. WC1A).
  • Areas with only double-digit districts: AB, LL, SO.
  • Areas with a district '0' (zero): BL, BS, CM, CR, FY, HA, PR, SL, SS (BS is the only area to have both a district 0 and a district 10).
  • The following central London single-digit districts have been further divided by inserting a letter after the digit and before the space: EC1–EC4 (but not EC50), SW1, W1, WC1, WC2, and part of E1 (E1W), N1 (N1C and N1P), NW1 (NW1W) and SE1 (SE1P).
  • The letters QVX are not used in the first position.
  • The letters IJZ are not used in the second position.
  • The only letters to appear in the third position are ABCDEFGHJKPSTUW when the structure starts with A9A.
  • The only letters to appear in the fourth position are ABEHMNPRVWXY when the structure starts with AA9A.
  • The final two letters do not use the letters CIKMOV, so as not to resemble digits or each other when hand-written.
  • Post code sectors are one of ten digits: 0 to 9 with 0 only used once 9 has been used in a post town, save for Croydon and Newport (see above).

Since SQL Server does not support full regex it gets a bit more complicated to account for all these caveats. If you really wanted fool proof validation I would be inclined to use the regex from answers this question, and use a CLR function to validate the postcode.

Upvotes: 1

Mr. Mascaro
Mr. Mascaro

Reputation: 2733

Query:

WITH CTE AS (SELECT CASE WHEN postcode LIKE('[A-Z][A-Z0-9][0-9]%')
                         THEN LEFT(postcode,2) ELSE 'Invalid' END AS PostCode
                   ,totalValue
             FROM Customer)
SELECT PostCode, SUM(totalValue) as totalValue
FROM CTE
GROUP BY PostCode  

Uses a CTE to gather the data. Invalid postcodes will be grouped into 'Invalid'.

Upvotes: 0

Alsin
Alsin

Reputation: 1618

Try this. All incorrect (or NULL) postcodes will be accumulated in '--' entry:

SELECT 
  CASE 
    WHEN postcode like '[A-Z][A-Z0-9][0-9]%'
    THEN LEFT(postcode, 2)
    ELSE '--'
  END, 
  SUM(totalValue)
FROM Customer
GROUP BY 
  CASE 
    WHEN postcode like '[A-Z][A-Z0-9][0-9]%'
    THEN LEFT(postcode, 2)
    ELSE '--'
  END

Upvotes: 0

Simone
Simone

Reputation: 1924

This is the select:

SELECT LEFT(postcode, 2), SUM(totalValue)
FROM Customer
WHERE postcode IS NOT NULL AND postcode != '' AND dbo.fn_isValid(postcode) = 1
GROUP BY LEFT(postcode, 2)
UNION
SELECT 'NULL', SUM(totalValue)
FROM Customer
WHERE postcode IS NULL
GROUP BY LEFT(postcode, 2)
UNION 
SELECT 'EMPTY', SUM(totalValue)
FROM Customer
WHERE postcode = ''
GROUP BY LEFT(postcode, 2)
UNION
SELECT 'WRONG', SUM(totalValue)
FROM Customer
WHERE dbo.fn_isValid(postcode) = 0
GROUP BY LEFT(postcode, 2)

Then you need the evaluation function, something like this:

CREATE FUNCTION [dbo].[fn_IsValid]
    (
    @Code nchar(32)
    )
RETURNS bit
AS
BEGIN
DECLARE @Status bit

SET @Status = -- Your validation logic
--be aware of empty and null values

RETURN @Status
END

Upvotes: -2

Gordon Linoff
Gordon Linoff

Reputation: 1269743

You can use LIKE:

where postcode like '[A-Z][A-Z0-9][0-9]%'

You might also want to check the length and other characteristics, but this answers your specific question.

EDIT:

For a separate entry, use case:

SELECT (CASE WHEN postcode like '[A-Z][A-Z0-9][0-9]%'
             THEN LEFT(postcode, 2)
             ELSE 'Separate Entry'
        END) as PostCode2, SUM(totalValue)
FROM Customer
GROUP BY (CASE WHEN postcode like '[A-Z][A-Z0-9][0-9]%'
               THEN LEFT(postcode, 2)
               ELSE 'Separate Entry'
          END);

Upvotes: 5

Leptonator
Leptonator

Reputation: 3499

Try this..

SELECT LEFT(postcode, 2), SUM(totalValue)
  FROM Customer
WHERE ISNUMERIC(LEFT(postcode, 1))<> 1
GROUP BY LEFT(postcode, 2)

Hope this helps.

Upvotes: -1

Related Questions