Reputation: 857
I am working on a spider to filter contact information by type, and I've run across a regular expression that seems to have a great deal of promise. The only issue is that it requires the entire mailing address in order to pass scrutiny.
^(?n:(?<address1>(\d{1,5}(\ 1\/[234])?(\x20[A-Z]([a-z])+)+ )|(P\.O\.\
Box\ \d{1,5}))\s{1,2}(?i:(?<address2>(((APT|B LDG|DEPT|FL|HNGR|LOT|PIER|RM|S
(LIP|PC|T(E|OP))|TRLR|UNIT)\x20\w{1,5})|(BSMT|FRNT|LBBY|LOWR|OFC|PH|REAR|SIDE|UPPR)\.?)
\s{1,2})?)(?<city>[A-Z]([a-z])+(\.?)(\x20[A-Z]([a-z])+){0,2})\,
\x20(?<state>A[LKSZRAP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADL N]|K[SY]|LA|M
[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD] |T[NX]|UT|V[AIT]|W[AIVY])
\x20(?<zipcode>(?!0{5})\d{5}(-\d {4})?))$
I need the expression to only require the street number and name. I don't understand how each piece of the expression is broken-up, however. Otherwise, I'd make the changes on my own. How would I alter the expression to accept any mailing address with up to 4 digits on the street number followed by any type of words (since there isn't a strong validation system when the addresses are input)?
123 Park Ave Apt 123 New York City, NY 10002
P.O. Box 12345 Los Angeles, CA 12304
123 Main St
123 City, State 00000
123 street city, ST 00000
123 Park Ave Apt 123
P.O. Box 12345
9784 Hwy 12
92 Main St
972 Smith dr
123 Main St, New York NY 14676
123 City, State 00000
123 street city, ST 00000
12345 street
Upvotes: 1
Views: 512
Reputation: 9235
This could be a good start
/^(\d{1,4}|P\.O\.)([a-zA-Z\s]+)(\d+)?$/i
/^(\d{1,4}|P\.O\.)\s([a-zA-Z0-9\s]+)\s?(\d+)?$/i
/^(\d{1,4}\s|P\.O\.)([a-zA-Z0-9\s]+)(\d+)?$/i
// passes
123 Park Ave Apt 123
P.O. Box 12345
9784 Hwy 12
92 Main St
972 Smith dr
1809 Caddo St
10200 Highway 5 North
// fails
123 Main St, New York NY 14676
123 City, State 00000
123 street city, ST 00000
12345 street
Usage:
<?php
$address = "123 Park Ave Apt 123";
$pattern = '/^(\d{1,4}|P\.O\.)([a-zA-Z\s]+)(\d+)?$/i';
if(preg_match($pattern, $address, $matches)){
echo $matches[0];
}
?>
Testing in progress... :)
Upvotes: 1