Eric R.
Eric R.

Reputation: 983

Matching Conditions in Regex

Just a note upfront: I'm a bit of a regex newbie. Perhaps a good answer to this question would involve linking me to a resource that explains how these sorts of conditions work :)

Lets say that I have a street name, like 23rd St or 5th St. I'd like to get rid of the proceeding "th", "rd", "nd", and "st". How can this be done?

Right now I have the expression: (st|nd|rd|th) . The problem with this is that it will also match street names that contain a "st", "nd", "rd", or "th". So what I really need is a conditional match that looks for a minimum of one number before itself (ie; 1st and not street).

Thank you!

Upvotes: 1

Views: 556

Answers (4)

Wiseguy
Wiseguy

Reputation: 20873

It sounds like you just want to match the ordinal suffix (st|nd|rd|th), yes?

If your regex engine supports it, you could use a lookbehind assertion.

/(?<=\d)(st|nd|rd|th)/

That matches (st|nd|rd|th) only if preceded by a digit \d, but the match does not capture the digit itself.

Upvotes: 5

piotrekkr
piotrekkr

Reputation: 3226

Try using this regex:

(\d+)(?:st|nd|rd|th)

I don't know ruby. In PHP I would use something like:

preg_replace('/(\d+)(?:st|nd|rd|th) /', '$1', 'South 2nd Street');

to remove suffix

Upvotes: 1

fge
fge

Reputation: 121840

What you really want are anchors.

Try and replace globally:

\b(\d+)(?:st|nd|rd|th)\b

with the first group.

Explanation:

  • \b --> matches a position where either a word character (digit, letter, underscore) is followed by a non word character (none of the previous group), or the reverse;
  • (\d+) --> matches one or more digits, and capture them in first group ($1);
  • (?:st|nd|rd|th) --> matches any of st, etc... wihtout capturing it ((?:...) is a non capturing group);
  • \b --> see above.

Demonstration using perl:

$ perl -pe 's/\b(\d+)(?:st|nd|rd|th)\b/$1/g' <<EOF
> Mark, 23rd street, New Hampshire
> I live on the 7th avenue
> No match here...
> azoiu32rdzeriuoiu
> EOF
Mark, 23 street, New Hampshire
I live on the 7 avenue
No match here...
azoiu32rdzeriuoiu

Upvotes: 2

SpacedMonkey
SpacedMonkey

Reputation: 2783

To remove the ordinal:

 /(\d+)(?:st|nd|rd|th)\b/$1/

You must capture the number so you can replace the match with it. You can capture the ordinal or not, it doesn't matter unless you want to output it somewhere else.

http://www.regular-expressions.info/javascriptexample.html

Upvotes: 0

Related Questions