Reputation: 237
I can't seem to get this regex quite right. I need to match a range number range from 00yo to 16yo but exclude any matches past 16.
The regex I am using at the moment is: \b[0-1]?[0-9][\s\S]?yo\b
but it does not exclude matches past 16 and will match 50yo
.
Please note that I am searching data on a raw hard drive with the data only accessible in a stream. I cannot use ^
or $
at start (the only option is to bookend the regex with a 'not' statement). I am using \b
to limit the number of false positive matches. There is more than 1tb of data so I am trying to keep false positives to a minimum and search speed to a maximum.
Examples of a VALID match from 0 to 16 are:
0 yo
0yo
0-yo
0_yo
00 yo
00yo
00-yo
00_yo
7 yo
7yo
7-yo
7_yo
07 yo
07yo
07-yo
07_yo
14 yo
14yo
14-yo
14_yo
Examples of NO match are anything above 16, e.g.:
20 yo
20yo
20-yo
20_yo
I am hoping to keep the joining character (i.e. - or _) as any white-space or non-white space character so that 14>yo would also match.
Any help is much appreciated.
Upvotes: 1
Views: 153
Reputation: 627102
You need to exclude digits from matching between the number and yo
(right now, [\S\s]
matches them).
I suggest:
\b(?:1[0-6]|0?[0-9])\D?yo\b
See regex demo
Explanation:
\b
- word boundary(?:1[0-6]|0?[0-9])
- 2 alternatives:
1[0-6]
- 1
followed by a digit from 0
to 6
|
- or...0?[0-9]
- optional 0
followed by any digit\D?
- one or zero non-digit characters (note you can further restrict it by turning it into a negated character class [^\d]?
, and add more characters there)yo\b
- a whole word yo
.Upvotes: 1