Reputation: 10447
I'm trying to get the first part of a UK postcode from a string that may have only the first part of the postcode or the full postcode in it. I'm struggling to make it work. I've got it working if the full postcode is entered by using a look-ahead, but I can't seem to make the look-ahead optional, so if only the first part of the postcode is entered it is matched.
My regex so far is ([A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW])(?=( ?[0-9][ABD-HJLNP-UW-Z]{2})))
I've got several postcodes that must match and these are the results using the above regex:
A10EA - Should match and does
A1 - Should match but doesn't
A10 0EA - Should match and does
A10 - Should match but doesn't
BH18 1AE - Should match and does
BH18AE - Should match and does
EC1M 6HJ - Should match and does
EC1M - Should match but doesn't
Z10 2EV - Shouldn't match and doesn't
QE3 6DA - Shouldn't match but matches E3 6DA
Can someone please help me solve this issue?
The RegEx I've been working from is the official one from the post office:
/^(GIR ?0AA|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW]) ?[0-9][ABD-HJLNP-UW-Z]{2})$/i
Before anyone flags this as a duplicate of PHP Find first part of UK postcode when full or part can be entered, it's not. The answer for that question doesn't work, see my comment to the answer.
Upvotes: 1
Views: 2180
Reputation: 8650
According this wiki page the post code always ends in 'digit letter letter', that would be a regex pattern of \d\w\w$
. Now we know how to spot what the end is, we just want to capture the rest.
A pattern like (\S*)\s*\d\w\w$
will work. That will capture the first half, and ensure that you do not get the last 'digit letter letter part. It will capture the first part by getting anything not white space, ie only letters and digits.
To fully explain this, the brackets ()
is what we are capturing. \S
says 'any one non white space character, with \S*
being all that we can get. so (\S*)
captures everything up to a space character, but will capture everything if the user doesn't enter one. The full regex I provided will also try to capture 'any white space, one digit, two letters, end of string' which will ensure that AA999AA
is split into AA99
and 9AA
.
I've also just noticed though that your question states you might not actually have that second part. I think you could get around that by checking the string length. If you trim white space and the length is less than 5 characters, you must only have the first part, so no need for any regex.
disclaimer this will not work for Anguillan postcodes. To support their postcodes as well, I think (\S*)\s*(?:\d\w\w|-\d{4})$ would work.
Upvotes: 1
Reputation: 10447
I've been looking at this the wrong way. I want to get the first part of the postcode and remove the second part if present, so why not validate the postcode first, then check for an end and strip it if necessary.
I'm already validating the postcode, this is code I already had:
$validate = Validation::factory(array('postcode' => $postcode));
$validate->rule('postcode', 'not_empty');
$validate->rule('postcode', 'regex', array(':value', '/^(GIR ?(0AA)?|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW]) ?([0-9][ABD-HJLNP-UW-Z]{2})?)$/i'));
if ( ! $validate->check())
{
$postcode = '';
}
So now I've added in this after it:
if ($postcode)
{
$short_postcode = $postcode;
// Check for an end section and then if present, remove it
if (preg_match('/ ?([0-9])[ABD-HJLNP-UW-Z]{2})$/i', $postcode, $match, PREG_OFFSET_CAPTURE))
{
$short_postcode = substr($postcode, 0, $match[0][1]);
}
}
and this leaves me with just the first part of the postcode, which is what I wanted. This Eval.in shows it working for all the examples in my question.
Upvotes: 0