Reputation: 127

Randomly generate a string that does NOT match a given regular expression

For testing purposes on a project I'm working on, I have a need to, if given a regular expression, randomly generate a string that will FAIL to be matched by it. For instance, if I'm given this regex:

^[abcd]d+

Then I should be able to generate strings such as:

hnbbad
uduebbaef
9f8;djfew
skjcc98332f

...each of which does NOT match the regex, but NOT generate:

addr32
bdfd09usdj
cdddddd-9fdssee

...each of which DO. In other words, I want something like an anti-Xeger.

Does such a library exist, preferably in Python (if I can understand the theory, I can most likely convert it to Python if need be)? I gave some thought to how I could write this, but given the scope of regular expressions, it seemed that might be a much harder problem than what things like Xeger can tackle. I also looked around for a pre-made library to do this, but either I'm not using the right keywords to search or nobody's had this problem before.

Upvotes: 7

Answers (4)

Priya Ananthasankar

Reputation: 1

Can we reduce the infinite number of possibilities, by restricting to generate strings from a give character set.

For example, I can define the character set, [QWERTYUIOP!@#$%^%^&*))_] and all the strings I generate randomly should be born from this set. That way we can reduce the infinite nature of this problem?

In fact even I am looking for a utility like this, preferably in Python.

Upvotes: 0

ridgerunner

Reputation: 34395

No this is impossible. There are an infinite number of regexes that match every string in the known universe. For example:

/^/
/.*/
/[^"\\]*(\\.[^"\\]*)*$/

etc.

This is because all these regexes can match nothing at all (which is something all strings have!)

Upvotes: 2

Mark Byers

Reputation: 838706

My initial instinct is, no, such a library does not exist because it's not possible. You can't be sure that you can find a valid input for any arbitrary regular expression in a reasonable amount of time.

For example, proving whether a number is prime is believed to be a hard to solve mathematical problem. The following regular expression matches any string which is at least 10000 characters long and whose total length is a prime number:

(?!(..+)\1+$).{10000}

I doubt that any library exists that can find a valid input to this regular expression in reasonable time. And this is a very easy example with a simple solution, e.g. 'x' * 10007 will work. It would be possible to come up with other regular expressions that are much harder to find valid inputs for.

I think the only way you are going to solve this is if you limit yourself to some subset of all possible regular expressions.

But having said that if you have a magical library that generates text that matches for any arbitrary regular expression then all you need to do is generate a regular expression that matches all the strings that don't match your original expression.

Luckily this is possible using a negative lookahead:

^(?![\s\S]*(?:^[abcd]d+))

If you are willing to change the requirements to only allow a limited subset of regular expressions then you can negate the regular expression by using boolean logic. For example if ^[abcd]d+ becomes ^[^abcd]|^[abcd][^d]. It is then possible to find a valid input for this regular expression in reasonable time.

Upvotes: 6

felixgaal

Reputation: 2423

I would do a loop, generating random combinations of random length, and test if matches the regexp. Repeat the loop until a not-match situation is reached.

Obviously, this would be inefficient. Are you sure you cannot invert the regexp and generate a match on the inverted regexp?

Upvotes: 3

Randomly generate a string that does NOT match a given regular expression

Answers (4)

Related Questions