Myzel394
Myzel394

Reputation: 1317

Python - Create every possibility for pair

I want to create every possibility from pairs of characters.

Example

Input:

"ab"

Output:

[["AABB", "AAAb", "AaBB", "AaBb"],
 ["AABb", "AAbb", "AaBb", "Aabb"],
 ["AaBB", "AaBb", "aaBB", "aaBb"],
 ["AaBb", "Aabb", "aaBb", "aabb"]]

It's basically like this:

https://www.frustfrei-lernen.de/images/biologie/mendel-4.jpg.

I tried to use itertools but for with this method I get EVERY possibility ("aaaa" as an example, and I don't want this"). Here's what I tried:

import itertools

def generate_product(l):
    yield from itertools.product(*([l] * len(l)))


characters = str(input("Gib deine Merkmale in Kleinbuchstaben ein. -> ")) # Get Input

splitted_characters = list(characters) # Split into list with chars
characters_list = splitted_characters + [a.upper() for a in splitted_characters] # create uppercase and lowercase chars


types = []
for x in generate_product(characters_list):
    types.append(["".join(x)])

for table in types:
    print(table)

For abc it would start with:

aabbcc

For a it's:

[["AA", "Aa"], ["Aa", "aa"]]

Upvotes: 2

Views: 585

Answers (4)

Michele Bastione
Michele Bastione

Reputation: 363

EDITED AGAIN

So, as my output wasn't 100% accurate, I realized that itertools.product does not actually suit this task too well. Therefore I implemented a function that does the job as desired:

from itertools import product 

def pretty_product(x,y):
    x,y,X,Y = x.lower(),y.lower(),x.upper(), y.upper()
    L = [X+Y, X+y, x+Y, x+y]
    E = [] 
    for i in product(L, L):
        f, s = i
        r = f[0]+s[0] if f[0]<s[0] else s[0]+f[0]
        r += f[1]+s[1] if f[1]<s[1] else s[1]+f[1]
        E.append(r)
    return E        

letters = 'ab' 

print([pretty_product(letters[0], letters[1])[n:n+4]for n in range(0,16,4)]) 

which prints

[['AABB', 'AABb', 'AaBB', 'AaBb'], ['AABb', 'AAbb', 'AaBb', 'Aabb'], ['AaBB', 'AaBb', 'aaBB', 'aaBb'], ['AaBb', 'Aabb', 'aaBb', 'aabb']]

Exactly as required.

Upvotes: 2

MyNameIsCaleb
MyNameIsCaleb

Reputation: 4489

Borrowing from this answer and adjusting it for the use case you have where you can enter just the single characters (ab, abc, etc.), ensuring we keep order of AaBbCcDd..., and breaking the lists into punnett square quadrants following the standard distribution so that

AA, Aa
Aa, aa

and

AABB, AABb, AaBB, AaBb
AABb, AAbb, AaBb, Aabb
AaBB, AaBb, aaBB, aaBb
AaBb, Aabb, aaBb, aabb

will always be output in the proper order and quadrant (and following for any size input).

def punnett(ins):
    # first get the proper order of AaBbCc... based on input order not alphabetical
    order = ''.join(chain.from_iterable((x.upper(), x.lower()) for x in ins))
    # now get your initial square output by sorting on the index of letters from order
    # and using a lot of the same logic as other answers (and the linked source)
    ps = [''.join(sorted(''.join(e), key=lambda word: [order.index(c) for c in word]))
        for e in product(*([''.join(e) for e in product(*e)]
                    for e in zip(
                        [list(v) for _, v in groupby(order, key = str.lower)], 
                        [list(v) for _, v in groupby(order, key = str.lower)])))]
    outp = set()
    outx = []
    # Now to get your quadrants you need to do numbers
    #    from double the length of the input
    #    to the square of the length of that double
    for x in range(len(ins)*2, (len(ins)*2)**2, len(ins)):
        # using this range you need the numbers from your x minus your double (starting 0)
        # to your x minus the length
        # and since you are iterating by the length then will end up being your last x
        # Second you need starting at x and going up for the length
        # so for input of length 2 -> x=4 -> 0, 1, 4, 5
        # and next round -> x=6 -> 2, 3, 6, 7
        temp = [i for i in range(x - len(ins)*2, x - len(ins))] + [i for i in range(x, x+len(ins))]
        # and now since we need to never use the same index twice, we check to make sure none 
        # have been seen previously
        if all(z not in outp for z in temp):
            # use the numbers as indexes and put them into your list
            outx.append([ps[i] for i in temp])
            # add each individually to your set to check next time if we have seen it
            for z in temp:
                outp.add(z)
    return outx

So output (plus new lines to make it look like a standard matrix):

>>> punnett('ab')
[['AABB', 'AABb', 'AaBB', 'AaBb'], 
 ['AABb', 'AAbb', 'AaBb', 'Aabb'], 
 ['AaBB', 'AaBb', 'aaBB', 'aaBb'], 
 ['AaBb', 'Aabb', 'aaBb', 'aabb']]
>>> punnett('abc')
[['AABBCC', 'AABBCc', 'AABBCc', 'AABbCc', 'AABbcc', 'AABbCC'], 
 ['AABBcc', 'AABbCC', 'AABbCc', 'AABbCc', 'AABbCc', 'AABbcc'], 
 ['AAbbCC', 'AAbbCc', 'AAbbCc', 'AaBBCc', 'AaBBcc', 'AaBbCC'], 
 ['AAbbcc', 'AaBBCC', 'AaBBCc', 'AaBbCc', 'AaBbCc', 'AaBbcc'], 
 ['AaBbCC', 'AaBbCc', 'AaBbCc', 'AabbCc', 'Aabbcc', 'AaBBCC'], 
 ['AaBbcc', 'AabbCC', 'AabbCc', 'AaBBCc', 'AaBBCc', 'AaBBcc']]

There are certainly some efficiencies to be gained in this code to make it shorter and you could generate your initial ps using any of the methods from the other posters if you wanted, assuming the generate them in the same order. You would still need to apply the ''.join(sorted(XXX, key...)) method to them to get the output you are looking for.

Upvotes: 1

Mad Physicist
Mad Physicist

Reputation: 114300

You can use itertools.product just fine, but you need to define your input iterables correctly. You want to iterate over each upper/lowercase pair twice. For a 2x2 example, you would want something like

itertools.product('Aa', 'Aa', 'Bb', 'Bb')

Since this is a genetics problem, you can think of it as loops over the possibilities for each gene, repeated for each gene for each parent. The advantage of phrasing it like that is that if one parent has different genotypes (not heterozygous), you can express that very easily. For example, something like:

itertools.product('AA', 'Aa', 'BB', 'bb')

Running a collections.Counter on the result of that will help you compute the statistics for the genotypes of the offspring.

But the question remains as to how to do this using itertools. Repeating the elements of an iterable N times can be achieved with itertools.chain.from_iterable and itertools.repeat:

itertools.chain.from_iterable(itertools.repeat(x, 2) for x in ('Aa', 'Bb'))

The resulting iterator can be passed into itertools.product directly:

from itertools import chain, product, repeat

def table_entries(genes):
    possibilities = product(*chain.from_iterable(repeat((g.upper(), g.lower()), 2) for g in genes))
    return [''.join(possibility) for possibility in possibilities]

This works for arbitrary numbers of genes, regardless of your original capitalization:

>>> table_entries('ab')
['AABB',
 'AABb',
 'AAbB',
 'AAbb',
 'AaBB',
 'AaBb',
 'AabB',
 'Aabb',
 'aABB',
 'aABb',
 'aAbB',
 'aAbb',
 'aaBB',
 'aaBb',
 'aabB',
 'aabb']
>>> table_entries('AbC')
['AABBCC',
 'AABBCc',
 'AABBcC',
 'AABBcc',
 'AABbCC',
 ...
 'aabBcc',
 'aabbCC',
 'aabbCc',
 'aabbcC',
 'aabbcc']

Upvotes: 3

blhsing
blhsing

Reputation: 106543

You would have to nest itertools.product like this:

list(map(''.join, product(*(map(''.join, product((c.upper(), c), repeat=2)) for c in 'ab'))))

This returns:

['AABB',
 'AABb',
 'AAbB',
 'AAbb',
 'AaBB',
 'AaBb',
 'AabB',
 'Aabb',
 'aABB',
 'aABb',
 'aAbB',
 'aAbb',
 'aaBB',
 'aaBb',
 'aabB',
 'aabb']

Upvotes: 0

Related Questions