Reputation: 3538
I'm using pcre (php) regular expressions and have developed the following regular expression:
(?:-?)(?:[A-Z’\s\.-]{8}.*)(?:NY\s?|ON\s?|FL\s){1}([A-Z].*)(?:M\s\d{1,2}\s.*|F\s\d{1,2}.*)
That I'm trying to apply to the following strings. Below each target string I have provided desired match versus actual match:
SUNDAY GEISHA-SUNDAY BREAK-JP NYHIT IT ONCE MORE M 13 1116 Race 1
Desired Match: HIT IT ONCE MORE
Actual Match: CE MORE
LOAD UP-DOVE HUNT FL SUMMATION TIME M 11 6T Race 6
Desired Match: SUMMATION TIME
Actual Match: TIME
TEMPLE STREET-STREET CRY-IR KY DONWORTH M 12 1 Race 9
Desired Match: DONWORTH
Actual Match: WORTH
In each case instead of stopping at the first occurrence of a match, the regular expression is consuming more of the string and matching the second match occurrence.
You can see a working example here at regex101.com: WORKING EXAMPLE
How do I get my regular expression to stop at the first match so I achieve my desired output? I'd also welcome any pointers on how I can improve my expression.
Thanks for you input.
Upvotes: 1
Views: 117
Reputation: 2369
Well, a simplier (but not more efficient) aproach:
/^.+(?:NY|FL|KY)\s?(.+?)(?: M.*)$/gmi
Will bring:
Try it: https://regex101.com/r/yX2bI1/4
Upvotes: 1
Reputation: 15010
^(?:[^ \n]* +){4}(.*?) +[a-z] +[0-9]+ [0-9a-z]+ Race [0-9]+$
Live Demo
https://regex101.com/r/kF9cU8/2
Sample text
SUNDAY GEISHA-SUNDAY BREAK-JP NY HIT IT ONCE MORE M 13 1116 Race 1
Desired Match: HIT IT ONCE MORE
Actual Match: CE MORE
LOAD UP-DOVE HUNT FL SUMMATION TIME M 11 6T Race 6
Desired Match: SUMMATION TIME
Actual Match: TIME
TEMPLE STREET-STREET CRY-IR KY DONWORTH M 12 1 Race 9
Desired Match: DONWORTH
Actual Match: WORTH
Sample Matches
MATCH 1
1. [33-49] `HIT IT ONCE MORE`
MATCH 2
1. [145-159] `SUMMATION TIME`
MATCH 3
1. [258-266] `DONWORTH`
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
(?: group, but do not capture (4 times):
----------------------------------------------------------------------
[^ \n]* any character except: ' ', '\n'
(newline) (0 or more times (matching the
most amount possible))
----------------------------------------------------------------------
+ ' ' (1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
){4} end of grouping
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
+ ' ' (1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
----------------------------------------------------------------------
+ ' ' (1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount possible))
----------------------------------------------------------------------
' '
----------------------------------------------------------------------
[0-9a-z]+ any character of: '0' to '9', 'a' to 'z'
(1 or more times (matching the most amount
possible))
----------------------------------------------------------------------
Race ' Race '
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount possible))
----------------------------------------------------------------------
$ before an optional \n, and the end of a
"line"
----------------------------------------------------------------------
Upvotes: 1