zack
zack

Reputation: 31

Match each address from the address number to the 'street type'

I have a paragraph of text that contains the following addresses:

I want to match each address from the address number to the 'street type' (ave., street, lane, road, rd., etc.) except for addresses that begin with the word of.

So of the addresses above, the statement would match:

900 Greenwood St. 500 block of Main Street 670 W. Townline Ave. 1234 River Avenue

and would not match:

1125 Main Ave.

Upvotes: 0

Views: 592

Answers (4)

Mike Samuel
Mike Samuel

Reputation: 120586

When

s = "at 900 Greenwood St.\n\
in 500 block of Main Street\n\
at 670 W. Townline Ave.\n\
before 1234 River Avenue\n\
of 1125 Main Ave."

the regex

/(?:^|\s)(?:(?!of\b)[a-z]+)\s*(\d[\s\S]*?\b(?:ave\.|avenue|st\.|street|lane|road|rd\.))/gi

used thus

var addresses = [];
for (var match = [], re = /(?:^|\s)(?:(?!of\b)[a-z]+)\s*(\d[\s\S]*?\b(?:ave\.|avenue|st\.|street|lane|road|rd\.))/gi;
     match = re.exec(s);) {
  addresses.push(match[1]);
}

produces

["900 Greenwood St.","500 block of Main Street","670 W. Townline Ave.","1234 River Avenue"]

Upvotes: 1

stema
stema

Reputation: 93086

This is fulfilling your request:

(?!^of\b)^.*?(\d+.*?(?:St\.|Street|Ave\.|Avenue))$

See it here on Regexr

(?!^of\b) Negative look ahead, row does not start with the word "of"

^ Matches the start of a row, use the m modifier!

.*? matches everything non greedy

(\d+.*? when the first numbers are found start the first capturing group with the (

(?:St\.|Street|Ave\.|Avenue)) Non capturing group because of the ?: matches the alternations between the |. The last ) closes the capturing group with the result.

$ Matches the end of the row, use the m modifier!

Your result is in the first capturing group.

Important this is working with your given examples, addresses can be that different, it will not work on all kind of existing addresses.

Upvotes: 1

MadV
MadV

Reputation: 57

As far as I know, there isn't a one simple regex pattern for this kind of complicated task. There are too many variables to cover for one pattern to work reliably. My first guess would be to look for "street", "ave", etc., but what if the street name doesn't have a suffix (i.e. 999 La Canada)? You could look for any phrase between "at", "in" or "before", but what if one of those phrases isn't an address? See what I mean?

My suggestion would be to take a look at Lingua::EN::AddressParse for Perl.

Upvotes: 2

Jonathan Hall
Jonathan Hall

Reputation: 79784

var addrs = create_array_of_possible_addresses();
var matching_addrs = [];
for (var i=0; i < addrs.length; i++) {
    if ( addrs[i].match(/^of/) continue;
    if ( addrs[i].match((/\d.*(?:St\.?|Street|Ave\.?|Avenue|Ln\.?|Rd\.?|Road))/ )
        matching_addrs.push( RegExp.$1 );
}

Untested.

Upvotes: 0

Related Questions