Regex to extract street from full address but leaving out optional directional component

Question

I'm trying to use .NET Regex to extract the street portion of a full address.

Given these addresses:
2565 W Field Stream Drive
2565 Field St
2565 2nd Street
2001 Easterman Road

I want these results:
Field Stream Drive
Field St
2nd Street
Easterman Road

I've come up with this "(?<=(^\d+\s[NSEW]{1}\s)).*(?=$)" but it doesn't return the street if the directional element is missing.

Wiktor Stribiżew · Accepted Answer

The problem is that the lookbehind pattern is executed at each location in the string, and it returns true once its pattern is found on the way from left to right. Thus, you can't just make [WNSE]\s+ optional in the lookbehind (like (?<=^\d+\s+(?:[WNSE]\s+)?).+), it will match immediately before even checking the optional pattern.

The not-so-efficient, but a .NET solution returning just the match value, will be

(?<=^\d+\s+[WNSE]\s+|^\d+(?!\s+[WNSE]\s)\s+).+

The first alternative in the lookbehind will match the location that is preceded with 1+ digits, 1+ whitespaces, W, N, S or E and then 1+ whitespaces, and the second one will match the 1+ digits + 1+ whitespaces at the start of the string that are not followed with W, N, S or E and a whitespace.

See the regex demo.

However, a much simpler solution is to use a capturing group:

^\d+\s+(?:[WNSE]\s+)?(.+)

See the regex demo. Here, the optional part will be tried at least once, and the .+ will only match what is after the N, S, E or W if present.

Regex to extract street from full address but leaving out optional directional component

Answers (2)

Related Questions