Stark
Stark

Reputation: 593

Regex not matching numbers after hyphen

I have the following:

1.5 5 tablespoon cream
½ (1 cup) heavy cream
¼ – ½ teaspoon cream
1 tablespoon cream

^(?:[\-\.\/\s]*[\d↉½⅓⅔¼¾⅕⅖⅗⅘⅙⅚⅐⅛⅜⅝⅞⅑⅒⅟])+

I'm trying to figure out why ¼ – ½ isn't matching, I have a escaped hyphen inside the non capture group.

What I've tried:

^(?:[\-\.\/\s\W]*[\d↉½⅓⅔¼¾⅕⅖⅗⅘⅙⅚⅐⅛⅜⅝⅞⅑⅒⅟])+ but it matches the ½ (1, I just want everything as long as it contains dots, hyphens, and spaces.

Negative lookahead with the \W which tags non-word chars, it captures exactly what I'm trying to achieve, but the negative lookahead doesn't block the 1 (1 from being captured, here is what I tried: ^(?:[\-\.\/\s\W]*(?!\()*[\d↉½⅓⅔¼¾⅕⅖⅗⅘⅙⅚⅐⅛⅜⅝⅞⅑⅒⅟])+

https://regex101.com/r/bITQ4F/2

Upvotes: 2

Views: 201

Answers (1)

Bohemian
Bohemian

Reputation: 425043

The "hyphen" in your text is actually an EN DASH (hex 2013 or decimal 8211), not a regular hyphen (hex 2D or decimal 45).

Copy-paste (so you grab the en dash) and use this:

^(?:[-–./\s]*[\d↉½⅓⅔¼¾⅕⅖⅗⅘⅙⅚⅐⅛⅜⅝⅞⅑⅒⅟])+

which has both the regular hyphen and the en dash character in the character class.

Note that none of the characters in the character class need escaping, even the hyphen because it's first and in that position it doesn't need to be escaped.

If you want to add the EM DASH too, copy and paste it at the end of the character class.


If your tool/language supports POSIX character classes (likely), you can use the dash punctuation class Pd:

^(?:[\p{Pd}./\s]*[\d↉½⅓⅔¼¾⅕⅖⅗⅘⅙⅚⅐⅛⅜⅝⅞⅑⅒⅟])+

which is more readable.

Upvotes: 3

Related Questions