elersong
elersong

Reputation: 857

REGEX to find only street number for PHP

I am working on a spider to filter contact information by type, and I've run across a regular expression that seems to have a great deal of promise. The only issue is that it requires the entire mailing address in order to pass scrutiny.

^(?n:(?<address1>(\d{1,5}(\ 1\/[234])?(\x20[A-Z]([a-z])+)+ )|(P\.O\.\ 
Box\ \d{1,5}))\s{1,2}(?i:(?<address2>(((APT|B LDG|DEPT|FL|HNGR|LOT|PIER|RM|S
(LIP|PC|T(E|OP))|TRLR|UNIT)\x20\w{1,5})|(BSMT|FRNT|LBBY|LOWR|OFC|PH|REAR|SIDE|UPPR)\.?)
\s{1,2})?)(?<city>[A-Z]([a-z])+(\.?)(\x20[A-Z]([a-z])+){0,2})\, 
\x20(?<state>A[LKSZRAP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADL N]|K[SY]|LA|M
[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD] |T[NX]|UT|V[AIT]|W[AIVY])
\x20(?<zipcode>(?!0{5})\d{5}(-\d {4})?))$

I need the expression to only require the street number and name. I don't understand how each piece of the expression is broken-up, however. Otherwise, I'd make the changes on my own. How would I alter the expression to accept any mailing address with up to 4 digits on the street number followed by any type of words (since there isn't a strong validation system when the addresses are input)?


Currently Accepted Inputs:

123 Park Ave Apt 123 New York City, NY 10002 
P.O. Box 12345 Los Angeles, CA 12304

Currently Denied Inputs:

123 Main St 
123 City, State 00000
123 street city, ST 00000

Desired Accepted Inputs:

123 Park Ave Apt 123 
P.O. Box 12345 
9784 Hwy 12
92 Main St
972 Smith dr

Desired Denied Inputs:

123 Main St, New York NY 14676
123 City, State 00000
123 street city, ST 00000
12345 street

Upvotes: 1

Views: 512

Answers (1)

hex494D49
hex494D49

Reputation: 9235

This could be a good start

/^(\d{1,4}|P\.O\.)([a-zA-Z\s]+)(\d+)?$/i    
/^(\d{1,4}|P\.O\.)\s([a-zA-Z0-9\s]+)\s?(\d+)?$/i
/^(\d{1,4}\s|P\.O\.)([a-zA-Z0-9\s]+)(\d+)?$/i

// passes
123 Park Ave Apt 123
P.O. Box 12345
9784 Hwy 12
92 Main St
972 Smith dr
1809 Caddo St
10200 Highway 5 North

// fails 
123 Main St, New York NY 14676
123 City, State 00000
123 street city, ST 00000
12345 street

Usage:

<?php

$address = "123 Park Ave Apt 123";
$pattern = '/^(\d{1,4}|P\.O\.)([a-zA-Z\s]+)(\d+)?$/i';
if(preg_match($pattern, $address, $matches)){
    echo $matches[0];
}

?>

Testing in progress... :)

RegEx Fiddle

Upvotes: 1

Related Questions