Mutuelinvestor
Mutuelinvestor

Reputation: 3538

How do you ensure that your regular expression matches the occurrence of a match

I'm using pcre (php) regular expressions and have developed the following regular expression:

(?:-?)(?:[A-Z’\s\.-]{8}.*)(?:NY\s?|ON\s?|FL\s){1}([A-Z].*)(?:M\s\d{1,2}\s.*|F\s\d{1,2}.*)

That I'm trying to apply to the following strings. Below each target string I have provided desired match versus actual match:

SUNDAY GEISHA-SUNDAY BREAK-JP NYHIT IT ONCE MORE M 13 1116 Race 1
Desired Match: HIT IT ONCE MORE
Actual Match: CE MORE  

LOAD UP-DOVE HUNT FL SUMMATION TIME M 11 6T Race 6
Desired Match: SUMMATION TIME
Actual Match: TIME  

TEMPLE STREET-STREET CRY-IR KY DONWORTH M 12 1 Race 9
Desired Match: DONWORTH
Actual Match: WORTH

In each case instead of stopping at the first occurrence of a match, the regular expression is consuming more of the string and matching the second match occurrence.

You can see a working example here at regex101.com: WORKING EXAMPLE

How do I get my regular expression to stop at the first match so I achieve my desired output? I'd also welcome any pointers on how I can improve my expression.

Thanks for you input.

Upvotes: 1

Views: 117

Answers (2)

Jaumzera
Jaumzera

Reputation: 2369

Well, a simplier (but not more efficient) aproach:

/^.+(?:NY|FL|KY)\s?(.+?)(?: M.*)$/gmi

Will bring:

  1. "HIT IT ONCE"
  2. "SUMMATION TIME"
  3. "DONWORTH"

Try it: https://regex101.com/r/yX2bI1/4

Upvotes: 1

Ro Yo Mi
Ro Yo Mi

Reputation: 15010

Description

^(?:[^ \n]* +){4}(.*?) +[a-z] +[0-9]+ [0-9a-z]+ Race [0-9]+$

Regular expression visualization

Example

Live Demo

https://regex101.com/r/kF9cU8/2

Sample text

SUNDAY GEISHA-SUNDAY BREAK-JP NY HIT IT ONCE MORE M 13 1116 Race 1
Desired Match: HIT IT ONCE MORE
Actual Match: CE MORE  

LOAD UP-DOVE HUNT FL SUMMATION TIME M 11 6T Race 6
Desired Match: SUMMATION TIME
Actual Match: TIME  

TEMPLE STREET-STREET CRY-IR KY DONWORTH M 12 1 Race 9
Desired Match: DONWORTH
Actual Match: WORTH

Sample Matches

MATCH 1
1.  [33-49] `HIT IT ONCE MORE`

MATCH 2
1.  [145-159]   `SUMMATION TIME`

MATCH 3
1.  [258-266]   `DONWORTH`

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of a "line"
----------------------------------------------------------------------
  (?:                      group, but do not capture (4 times):
----------------------------------------------------------------------
    [^ \n]*                  any character except: ' ', '\n'
                             (newline) (0 or more times (matching the
                             most amount possible))
----------------------------------------------------------------------
     +                       ' ' (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
  ){4}                     end of grouping
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
   +                       ' ' (1 or more times (matching the most
                           amount possible))
----------------------------------------------------------------------
  [a-z]                    any character of: 'a' to 'z'
----------------------------------------------------------------------
   +                       ' ' (1 or more times (matching the most
                           amount possible))
----------------------------------------------------------------------
  [0-9]+                   any character of: '0' to '9' (1 or more
                           times (matching the most amount possible))
----------------------------------------------------------------------
                           ' '
----------------------------------------------------------------------
  [0-9a-z]+                any character of: '0' to '9', 'a' to 'z'
                           (1 or more times (matching the most amount
                           possible))
----------------------------------------------------------------------
   Race                    ' Race '
----------------------------------------------------------------------
  [0-9]+                   any character of: '0' to '9' (1 or more
                           times (matching the most amount possible))
----------------------------------------------------------------------
  $                        before an optional \n, and the end of a
                           "line"
----------------------------------------------------------------------

Upvotes: 1

Related Questions