Reputation: 5914
I am trying to filter some reporting results (Google Analytics - Javascript regex support) to only include rows that contain the pattern "OA", "OA" cannot be the last characters in the string. My regex below solves for the "last characters in the string issue", but doesn't restrict the match to only those rows that have some instance of "OA" in them. Should I include another OR statement to capture that or should I update my current regex to account for that
E.g. Text (Expected results):
OA > OA //No Match
Paid Search > OA //No Match
Paid Search > (none) > Social //No Match
OA > Paid Search //Match
Social > OA > (none) > (none) //Match
Regex:
.{,2}$|.*[^OA]$
Upvotes: 0
Views: 502
Reputation: 163207
You could match OA
and then make sure that the string does not end with OA
:
^.*OA.*(?:[^O]A|O[^A]|[^O][^A])$
That would match
^ # Begin of the string .*OA # match any character zero or more times and match OA .* # Match any characters zero or more times (?: # Non capturing group [^O]A # Match not O and A | # or O[^A] # Match O and not A | # or [^O][^A] # Match not O not A ) # Close non capturing group $ # End of the string
Upvotes: 0
Reputation: 6333
what about the following:
OA.(?!.*OA$)
it requires additionally match another whatever char after OA
, so it guarantees to not match the last OA
instance; then it explicitly look ahead to match the end of string for not OA
.
I do not program javascript so I don't know if your engine supports that. Locally I tested with grep using grep -P 'OA.(?!.*OA$)'
and it works for your examples.
In the case of denying negative lookahead, you can spell out what negative lookahead would actually do:
(OA.*(O[^A]|[^O].)|OA.)$
The trick here is to come up with an automaton that solely denies OA
at the end. If O
is seen, then you don't want A
but anything else; otherwise, any character will be acceptable. By formulating it in an RE explicitly, you will generate the first part of expression I proposed above.
The second part is a fix to fill in the gap. because the first part requires matching string to have length >= 4, the second part close the gap to eliminate the corner case to force the length of matching string goes down to >= 3, which achieves the same set of strings as negative lookahead implementation.
Upvotes: 1