Reputation: 13
I have a string like this:
$words = "[a] (good|bad) word [for fun]";
where:
So, the possible outcomes from the above string would be like these:
a good word for fun
a bad word for fun
a good word
a bad Word
good word for fun
bad word for fun
good word
bad word
Can someone help me to find a way to extract all the possible outcome (like the example above) and store them in an array?
Thanks!
Upvotes: 0
Views: 209
Reputation: 98048
use warnings;
use strict;
use constant { OPT => 0, OR => 1, FIXED => 2 };
my $words = "[a] (good|bad) word [for fun]";
my @tokens;
# parse input
my @v = grep {$_} split /(\[|\]|\(|\||\))/, $words;
while (my $token = shift @v) {
if ($token eq '[') {
push @tokens, [ OPT, shift @v ];
shift @v; # ]
} elsif ($token eq '(') {
my @list;
do {
push (@list, [ FIXED, shift @v] );
} until (shift @v eq ')'); # '|,)'
push @tokens, [ OR, \@list ];
}
else {
push @tokens, [FIXED, $token];
}
}
# generate output
my @phrases = ("");
for my $token (@tokens) {
my @additions;
if ($token->[0] == OPT) {
push @additions, $_.$token->[1] for @phrases;
} elsif ($token->[0] == FIXED) {
$_ .= $token->[1] for @phrases;
} elsif ($token->[0] == OR) {
foreach my $list (@{$token->[1]}) {
push @additions, $_.$list->[1] for @phrases;
}
@phrases = ();
}
push @phrases, @additions;
}
print "$_\n" for map {s/^\s+//;s/[ ]+/ /g;$_} @phrases;
Upvotes: 2
Reputation: 118148
I saw this as an opportunity to try using Parse::RecDescent. I don't understand these things very well, so there might have been a better way to write the grammar.
The parser allows me to generate a list of sets of phrases to use. Then, I feed that list of sets to Set::CrossProduct to generate the Cartesian product of sets.
#!/usr/bin/env perl
use strict;
use warnings;
use Parse::RecDescent;
use Set::CrossProduct;
our @list;
my $parser = Parse::RecDescent->new(q{
List: OptionalPhrase |
AlternatingMandatoryPhrases |
FixedPhrase
OptionalPhrase:
OptionalPhraseStart
OptionalPhraseContent
OptionalPhraseEnd
OptionalPhraseStart: /\\[/
OptionalPhraseContent: /[^\\]]+/
{
push @::list, [ $item[-1], '' ];
}
OptionalPhraseEnd: /\\]/
AlternatingMandatoryPhrases:
AlternatingMandatoryPhrasesStart
AlternatingMandatoryPhrasesContent
AlternatingMandatoryPhraseEnd
AlternatingMandatoryPhrasesStart: /\\(/
AlternatingMandatoryPhrasesContent: /[^|)]+(?:[|][^|)]+)*/
{
push @::list, [ split /[|]/, $item[-1] ];
}
AlternatingMandatoryPhraseEnd: /\\)/
FixedPhrase: /[^\\[\\]()]+/
{
$item[-1] =~ s/\\A\\s+//;
$item[-1] =~ s/\s+\z//;
push @::list, [ $item[-1] ];
}
});
my $words = "[a] (good|bad) word [for fun]";
1 while defined $parser->List(\$words);
my $iterator = Set::CrossProduct->new(\@list);
while (my $next = $iterator->get) {
print join(' ', grep length, @$next), "\n";
}
Output:
a good word for fun a good word a bad word for fun a bad word good word for fun good word bad word for fun bad word
Upvotes: 1
Reputation: 2382
With regular expressions, you can determine if "bad word" matches your pattern "[a] (good|bad) word [for fun]" (which, as regex match, would probably be spelled as /(a )?(good|bad) word( for fun)?/
). But it sounds like you actually want to do the inverse, ie. generate all possible inputs from your pattern. This is not something regexes can do.
What you should be looking at is called permutations. Your template string has these parts:
So there are two possibilities for fragments 1 and 2, just one for fragment three, and again two for fragment 4, giving you 2 * 2 * 1 * 2 = 8 possiblities.
Just store all these possiblities in a multi-dimensional array, e.g.
my $sentence = [["a ", ""], ["good", "bad"], ["word"], ["for fun", ""]];
Then look up permutation algorithms or permutation modules on CPAN to find all the combinations.
As an example for a single permuation, "bad word" would be represented as:
my $badword =
$sentence->[0]->[0]
. $sentence->[1]->[1]
. $sentence->[2]->[0]
. $sentence->[3]->[0];
Upvotes: 1