J L
J L

Reputation: 2297

How to generate strings from simple Regex?

So if I have simple regex such as:

"g{1,3}(a|e|i|o|u)"

I want my program to generate strings of

ga
ge
gi
go
gu
gga
gge
ggi
ggo
ggu
ggga
ggge
gggi
gggo
gggu

I would not use "g*(a|e|i|o|u)" for regex, as there can be infinite number of 'g's and there will be infinite number of strings.

Any recommendation on simple efficient algorithm to make this? I think I will be able to make these strings in a brute force way by using for/while loops, but I'm wondering if there is any methods I could use to make this algorithm work.

I googled how to create strings from regex and many people seemed to redirect to: https://code.google.com/p/xeger/ to use the library that is built, but I was wondering if I could get some suggestions to make my own for these simple regex.

Upvotes: 1

Views: 212

Answers (3)

Sergiu Toarca
Sergiu Toarca

Reputation: 2759

I created Debuggex, which generates random strings to give you an idea of what a regex does.

If you already have a parse tree for your regex, you can use the following logic to generate random matches:

OrTree.random:
    Choose a child randomly, return its random()

ConcatTree.random:
    For every child, call random()
    Return the concatenation of all the results

RepeatTree.random:
    Choose a valid random number of repetitions within min and max
    Call random() on your child that many times
    Return the concatenation of all the results

Literal.random:
    Return the literal

You can generate random strings even if you use the * operator. This is done by choosing a distribution from 0 to infinity from which to generate numbers, just like you use a uniform distribution for finite sets. For example:

InfiniteRepeatTree.random:
    Flip a coin until I get tails
    Call random on child() the number of times that the coin landed heads
    Return concatenation of the results

Hope that helps :)

Upvotes: 1

Joe Elleson
Joe Elleson

Reputation: 1373

char[] vowels = new char[] {'a','e','i','o','u'};
for (int i = 1; i <= 3; i++) {
    for (int j = 0; j < vowels.length; j++) {
         for (int k = 0; k < i; k++) {
             System.out.print("g");
         }
         System.out.println(vowels[j]);
    }
}

Not generic solution, but it works for your specific example

Upvotes: 0

aglassman
aglassman

Reputation: 2653

Xeger is open source. You could browse their code base for ideas.

EDIT:

Their code base looks very small, so shouldn't be too hard. It only generates random strings that will match, not all strings. It could still be a good starting point though.

Upvotes: 1

Related Questions