Reputation: 9
have searched all over but cant seem to figure this out. have some addresses that i am trying to break out the street name.
i need to be able to get Sea Spray or Walden Elms or High star along with the single word streets and the streets similar to 54th or 12th st.
996 SEA SPRAY DR
174 S WALDEN ELMS CIR
1210 CHAPEL CONE LN # 1210
602 SAWYER ST # 710
911 STATE HWY
16715 CLAY RD
12302 HIGH STAR DR
575 PETE SCHAFF BLVD
2700 TOWN CENTER BLVD N
601 54TH ST # 1105
815 12TH ST
The below gets the streets like i need but is including the street suffix on all streets other then 54th and 12th . why isnt the last non capturing group working?
(\d+(?:ST|RD|TH|ND|BLVD|LN|DR|CIR))\s|(\s[A-Z]\w*)|(\d+(?:ST|RD|BLVD|CIR|LN))
Upvotes: 0
Views: 338
Reputation: 922
If your address list is limited and if you are able to predict format as you have mentioned above, Can you not use simple string split in C# like this?
string[] arrSplitAdd = address.Split(new string[] {"ST","RD","TH","ND","BLVD","LN","DR","CIR"},StringSplitOptions.RemoveEmptyEntries);
string numberAndStreet=arrSplitAdd[0];
string streetName = RegEx.Replace(streetName,"[0-9]*","");
Iterate this logic for each address line.
Upvotes: 0
Reputation: 4607
The problem is that the \w*
expression in your middle group is including the items you want to exclude.
Your regex is really three expressions with or
conditionals (the |
)
(\d+(?:ST|RD|TH|ND|BLVD|LN|DR|CIR))\s
(\s[A-Z]\w*)
(\d+(?:ST|RD|BLVD|CIR|LN))
The first group appears to be trying to match on number-based street names (ex: "14th", "3rd") and is successfully capturing the example addresses on 54th St and 12th St.
The third group seems to be just a subset of the first group, but without the trailing space (\s
) expression. It isn't matching anything in your examples.
The second group is capturing any space (\s
) followed by any single character from capital A-Z ([A-Z]
), then any number of word characters (\w*
). This is matching pretty much everything else. If you want to exclude "ST, RD, BLVD", etc from what it matches, then you'll need to do a negative lookahead assertion for those words ((?!(RD|DR|BLVD|CIR|LN))
) in your expression, which would make the middle expression look like this:
(\s(?!(RD|DR|BLVD|CIR|LN))[A-Z]\w*)
and the full expression look like this:
(\d+(?:ST|RD|TH|ND|BLVD|LN|DR|CIR))\s|(\s(?!(RD|DR|BLVD|CIR|LN))[A-Z]\w*)|(\d+(?:ST|RD|BLVD|CIR|LN))
All that said, I think:
15W22S 87th St
. This would fail your regex since the house "number" includes letters.Upvotes: 1