Nate Glenn
Nate Glenn

Reputation: 6744

balancing reluctant and greedy matching

I am trying to match the two address lines below (mostly fictional addresses):

2320 ZINER CIR East 43123
1111 ZINER CIR East Bernstadt 43123

My regular expression is built using names of cities, and East Bernstadt is a city name. However, streets can also end in "East". My predicament then is that if I greedy match "East", as in:

\d+ [^ ]+ CIR( East)?( East Bernstadt)?(?: \d+)?

...then only the fist line is matched (the other is a partial match). If I use a reluctant match, as in:

\d+ [^ ]+ CIR( East)??( East Bernstadt)?(?: \d+)?

...the second line matches but not the first.

How can I change the regular expression so that both lines are matched completely? "East" and "East Bernstadt" must remain in separate parts of the expression.

EDIT: I cannot treat "East" and "East Bernstadt" with one parenthesis group; both expressions above must match, but also "1234 Ziner CIR East East Bernstadt" must match as well (some streets have cardinal directions on them).

Upvotes: 1

Views: 45

Answers (1)

Tim007
Tim007

Reputation: 2557

Try this

\d+\s+\S+\s+CIR(?:(?!\sEast Bernstadt)\s+East)?(?:\s+East Bernstadt)?(?: +\d+)?

Regex demo

Explanation:
\s: "whitespace character": space, tab, newline, carriage return, vertical tab sample
\S: One character that is not a whitespace character as defined by \S sample
(?!…): Negative lookahead sample

Upvotes: 1

Related Questions