Collin Watts
Collin Watts

Reputation: 51

Algorithm to generate permutations of a list of strings and their substrings

This algorithm has been escaping me for some time now. Lets say I'm given the string "cccaatt". I'm trying to generate all the possible variations of each substring of repeated letters. EG, "cccaatt" as an input would return:

cat, catt, caat, caatt, ccat, ccatt, ccaat, ccaatt, cccat, cccatt, cccaat, cccaatt

The order of the results does not matter, so long as it returns all of them. Generally, the input is a string, consisting of g groups of repeated letters, each group k_n letters long.

My intuition is that this is a recursive algorithm, but the exact structure of it has been difficult to understand.

Upvotes: 5

Views: 725

Answers (3)

Paddy3118
Paddy3118

Reputation: 4772

The Python itertools module has powerful tools to group and then to iterate on members of groups leading to the following program.

I have shown some intermediate results and used the pprint module to prettyprint the answer:

Python 2.7.3 (default, Aug  1 2012, 05:16:07) 
[GCC 4.6.3] on linux2
Type "copyright", "credits" or "license()" for more information.
>>> import itertools
>>> instring = "cccaatt"
>>> [(x[0], list(x[1])) for x in itertools.groupby(instring)]
[('c', ['c', 'c', 'c']), ('a', ['a', 'a']), ('t', ['t', 't'])]
>>> xx = [list(x[0]*n for n in range(1, len(list(x[1]))+1)) for x in itertools.groupby(instring)]
>>> xx
[['c', 'cc', 'ccc'], ['a', 'aa'], ['t', 'tt']]
>>> list(itertools.product(*xx))
[('c', 'a', 't'), ('c', 'a', 'tt'), ('c', 'aa', 't'), ('c', 'aa', 'tt'), ('cc', 'a', 't'), ('cc', 'a', 'tt'), ('cc', 'aa', 't'), ('cc', 'aa', 'tt'), ('ccc', 'a', 't'), ('ccc', 'a', 'tt'), ('ccc', 'aa', 't'), ('ccc', 'aa', 'tt')]
>>> from pprint import pprint as pp
>>> pp(list(itertools.product(*xx)))
[('c', 'a', 't'),
 ('c', 'a', 'tt'),
 ('c', 'aa', 't'),
 ('c', 'aa', 'tt'),
 ('cc', 'a', 't'),
 ('cc', 'a', 'tt'),
 ('cc', 'aa', 't'),
 ('cc', 'aa', 'tt'),
 ('ccc', 'a', 't'),
 ('ccc', 'a', 'tt'),
 ('ccc', 'aa', 't'),
 ('ccc', 'aa', 'tt')]
>>> 

Or as a function:

>>> def stringexpand(instring):
    xx = [list(x[0]*n for n in range(1, len(list(x[1]))+1)) for x in itertools.groupby(instring)]
    return list(itertools.product(*xx))

>>> pp(stringexpand("cccaatt"))
[('c', 'a', 't'),
 ('c', 'a', 'tt'),
 ('c', 'aa', 't'),
 ('c', 'aa', 'tt'),
 ('cc', 'a', 't'),
 ('cc', 'a', 'tt'),
 ('cc', 'aa', 't'),
 ('cc', 'aa', 'tt'),
 ('ccc', 'a', 't'),
 ('ccc', 'a', 'tt'),
 ('ccc', 'aa', 't'),
 ('ccc', 'aa', 'tt')]
>>> 

You seem to need the strings joined from their parts. This can be done in this slight mod:

def stringexpand(instring):
    xx = [list(x[0]*n for n in range(1, len(list(x[1]))+1)) for x in itertools.groupby(instring)]
    return [''.join(parts) for parts in itertools.product(*xx)]

Which returns:

['cat',
 'catt',
 'caat',
 'caatt',
 'ccat',
 'ccatt',
 'ccaat',
 'ccaatt',
 'cccat',
 'cccatt',
 'cccaat',
 'cccaatt']

Upvotes: 1

chm
chm

Reputation: 1519

If you store the alphabet and maximum occurrences of each letter (as awesomely mentioned in the comments) you can do this:

function variations(letter_type, current string) {
    if (letter_type is in the alphabet) {
        while (string has fewer than the max amount of that letter) {
            add one of that letter to current string
            variations(next letter, current string)
        }
    } else {
        print current string // since there are no more letters to add
    }
}

In Java:

public class Variations {

    static String[] alphabet = {"c","a","t"};
    static int[] maximums = {3, 2, 2};

    public static void main(String[] args) {
        variations(0, "");
    }

    public static void variations(int letter_type, String curr) {
        if (letter_type < alphabet.length) {
            for (int i = 1; i <= maximums[letter_type]; i++) {
            curr += alphabet[letter_type];
            variations(letter_type+1, curr);
            }
        } else {
            System.out.println(curr);
        } 
    }

}

Upvotes: 2

dpington
dpington

Reputation: 1884

Decompose the string into a list of numbers and the number of repeats, i.e. "cccaatt" => [(c,3), (a,2), (t,2)]. then the problem could be defined recursively.

Let xs = [(a_1, n_1), (a_2, n_2), (a_3, n_3), ... (a_k, n_k)]
define Perm(xs):
    if len(xs) == 1:
        return all length variations of xs
    else:
        return every sequence in Perm(x[:-1]) appended with one or more from x[-1]

I'll have a python example shortly.

> perm("cccaatt")
> ['cat', 'ccat', 'cccat', 'caat', 'ccaat', 'cccaat', 'catt', 'ccatt', 'cccatt', 'caatt', 'ccaatt', 'cccaatt']

Code attached

def perm(xs):
    if not xs:
        return []

    # group them into the correct format, probably should have used groupby + zip
    l = [(xs[0],1)]
    for x in xs[1:]:
        last, num = l[-1]
        if last == x:
            l[-1] = (last, num+1)
        else:
            l.append((x, 1))

    # print(l)
    print(recurse(l))

# this is where the real work is done.
def recurse(xs):
    if len(xs) == 1:
        return [ xs[0][0] * x for x in range(1, xs[0][1] + 1) ]

    prev = recurse(xs[:-1])
    char, num = xs[-1]
    return [ y + x * char for x in range(1,num + 1) for y in prev ]

Upvotes: 1

Related Questions