Reputation: 666
How can I generate string by given regex in ruby?
Class MyRegex
def self.generate_string(reg)
#return generated string.
end
end
When I call
MyRegex.generate_string(/a*/) #it will return random string.
expecting output:
aaa
aa
aaaaa
and so on
Upvotes: 2
Views: 1471
Reputation: 28305
A bit late to the party, but I have created a powerful ruby gem which solves the original problem:
https://github.com/tom-lord/regexp-examples
/this|is|awesome/.examples #=> ['this', 'is', 'awesome']
/https?:\/\/(www\.)?github\.com/.examples #=> ['http://github.com', 'http://www.github.com', 'https://github.com', 'https://www.github.com']
Upvotes: 3
Reputation: 6258
The short answer is that you can't, as some of the strings could be infinite if you allow the *
, +
or repetitions which are open on the right eg. {4,}
.
If you want to do it any way, then you have two strategies, both of which starts with parsing the regex, and building a state machine representing it.
Then you can either generate a random run through it of max length 'n'. This will give you a random string of at most length n
. Or you can add an empty transition to all the states in your state machine to a terminal state, and simply do a random walk until you hit a terminal state. This will give you a completely random string, which the regex accepts, where the length has an arbitrary length, but longer strings are less probable. But please not that there is still an, albeit very very small, chance that this method will never end, as the string size grows, then the probability of outputting a new character falls, but just as the string length never hits infinite, neither does the probability of a new character hit zero.
That is almost exactly what the code posted in the comments by @neil-slater https://github.com/repeatedly/ruby-string-random
Edit
The OP asks if it is possible to generate a random string, which a given regular expression matches.
A regular expression is a string representation of a regular language. A finite automaton is a decider which encodes a given regular language, and can determine if a given string is part of that regular language. So basically the way regular expression matching works, is by compiling the regular expression to a finite automaton, and use that to see if it accepts the string. That's matching.
Now lets look at generation. You can use the same finite automaton to generate strings, however as a finite automata, as @sawa correctly pointed out, only works on finite strings, then you have to make sure that you only generate a finite string. One way of doing this is randomly deciding a maximum length, and then do a random walk of at most that length in the fintite automaton. One way of not doing this is the way both @sawa and I suggested of taking a transition with some probability, or simply stopping. As this potentially doesn't terminate, because the product of any non-zero probabilities, only approaches zero, but new reaches it.
Upvotes: 1
Reputation: 168101
This answer is not intended to fully answer you question. Its purpose is twofold: (i) to show that it is not impossible, so that jbr's answer is entirely wrong, and (ii) to suggest you that, nevertheless, it is not trivial, and you have to work out the complete code by yourself.
Since to fully answer your question is would probably not fit the space for a single answer in a Q and A site like this, I will show a code that generates all the possible strings that matches a fixed regex:
/a*/
The code is like this:
class Regexp
def self.generate_string
rand > 0.5 ? "" : "a#{generate_string}"
end
end
Each time you run Regexp.generate_string
, a random string that matches /a*/
will be generated. The string would be of an arbitrary length, and the longer the string is, it will be generated with less possibility.
Upvotes: 0