cphill
cphill

Reputation: 5914

Javascript Regex must contain pattern, but not at the end of the string

I am trying to filter some reporting results (Google Analytics - Javascript regex support) to only include rows that contain the pattern "OA", "OA" cannot be the last characters in the string. My regex below solves for the "last characters in the string issue", but doesn't restrict the match to only those rows that have some instance of "OA" in them. Should I include another OR statement to capture that or should I update my current regex to account for that

E.g. Text (Expected results):

OA > OA //No Match
Paid Search > OA //No Match
Paid Search > (none) > Social //No Match
OA > Paid Search //Match
Social > OA > (none) > (none) //Match

Regex:

.{,2}$|.*[^OA]$

Upvotes: 0

Views: 502

Answers (2)

The fourth bird
The fourth bird

Reputation: 163207

You could match OA and then make sure that the string does not end with OA:

^.*OA.*(?:[^O]A|O[^A]|[^O][^A])$

That would match

^          # Begin of the string
.*OA       # match any character zero or more times and match OA
.*         # Match any characters zero or more times
(?:        # Non capturing group
  [^O]A    # Match not O and A
  |        # or
  O[^A]    # Match O and not A
  |        # or 
  [^O][^A] # Match not O not A
)          # Close non capturing group
$          # End of the string

Upvotes: 0

Jason Hu
Jason Hu

Reputation: 6333

what about the following:

OA.(?!.*OA$)

it requires additionally match another whatever char after OA, so it guarantees to not match the last OA instance; then it explicitly look ahead to match the end of string for not OA.

I do not program javascript so I don't know if your engine supports that. Locally I tested with grep using grep -P 'OA.(?!.*OA$)' and it works for your examples.


In the case of denying negative lookahead, you can spell out what negative lookahead would actually do:

(OA.*(O[^A]|[^O].)|OA.)$

The trick here is to come up with an automaton that solely denies OA at the end. If O is seen, then you don't want A but anything else; otherwise, any character will be acceptable. By formulating it in an RE explicitly, you will generate the first part of expression I proposed above.

The second part is a fix to fill in the gap. because the first part requires matching string to have length >= 4, the second part close the gap to eliminate the corner case to force the length of matching string goes down to >= 3, which achieves the same set of strings as negative lookahead implementation.

Upvotes: 1

Related Questions