Ray
Ray

Reputation: 2258

Regex for matching "everything but" a string

I'm looking for a regular expression that will match all strings EXCEPT those that contain a certain string within. Can someone help me construct it?

For example, looking for all strings that do not have a, b, and c in them in that order.

So
abasfaf3 would match, whereas
asasdfbasc would not

Upvotes: 1

Views: 9148

Answers (4)

Alan
Alan

Reputation: 46903

in perl:

if($str !~ /a.*?b.*?.*c/g)
{
    print "match";
}

should work.

Upvotes: 2

Federico A. Ramponi
Federico A. Ramponi

Reputation: 47085

In Python:

>>> r = re.compile("(?!^.*a.*b.*c.*$)")
>>> r.match("abc")
>>> r.match("xxabcxx")
>>> r.match("ab ")
<_sre.SRE_Match object at 0xb7bee288>
>>> r.match("abasfaf3")
<_sre.SRE_Match object at 0xb7bee288>
>>> r.match("asasdfbasc")
>>>

Upvotes: 4

Johannes Schaub - litb
Johannes Schaub - litb

Reputation: 507373

Well, you can theoretical build a regex that matches the opposite. But for longer strings, that regex would become big. The way you would do that systematically is (greatly simplified):

  • Convert the regular expression into a deterministic finite automaton
  • Convert the end conditions of the automaton, so that it accepts the inverted regular language
  • Convert the automaton back to a regular expression by successively removing nodes from the automaton, yet keeping the behavior of it the same. Removing one node will require putting two or more regular expressions together, so that they will account for the removed node.
  • If you happen to have one start node, and one end node, you are finished: The regular expression labeling the edge between them is your searched regular expression.

Practically, you can just match for the string you want not have in it, and invert the result. Here is what it would look like in awk:

echo azyxbc | awk '{ exit ($0 !~ /a.*b.*c/); }' && echo matched

If you are interested into this, i recommend the book "Introduction to the Theory of Computation" by Michael Sipser.

Upvotes: 1

VonC
VonC

Reputation: 1329322

in Java:

(?m)^a?(.(?!a[^b\r\n]*b[^\r\nc]*c))+$

does match

abasfaf3
xxxabasfaf3

does not match

asasdfbascf
xxxxasasdfbascf

Upvotes: 0

Related Questions