Elliot Roberts
Elliot Roberts

Reputation: 940

Most efficient way to find the number of possible unique fixed length permutations?

I've got this dictionary:

num_dict = {
    (2, 3): [(2, 2), (4, 4), (4, 5)],
    (2, 2): [(2, 3), (4, 4), (4, 5)],
    (4, 5): [(4, 4)],
    (1, 0): [(1, 1), (2, 2), (2, 3), (4, 4), (4, 5)],
    (4, 4): [(4, 5)],
    (1, 1): [(1, 0), (2, 2), (2, 3), (4, 4), (4, 5)],
    }

I need to find the max number of 3 long combinations of the first values of each of these tuples, where only the values of each key can proceed said key.

My current code for finding all unique (3 long) combinations is this:

ans_set = set()
for x in num_dict:
    for y in num_dict[x]:
        for z in num_dict[y]:
            ans_set.add((x[0], y[0], z[0]))
return len(ans_set)

This returns 10 and ans_set ends up being:

{
 (2, 2, 2), (1, 2, 2), (1, 4, 4),
 (2, 2, 4), (1, 1, 2), (4, 4, 4),
 (1, 2, 4), (1, 1, 4), (1, 1, 1),
 (2, 4, 4)
}

But I don't actually care about what the sets are, just the number of them

This method is not particularly efficient as it actually generates every possible combination and puts it in a set.

I don't need to know each unique combination, I just need to know how many there are.

I have a feeling this can be done, maybe using the lengths of the value lists? but I am having trouble wrapping my head around it.

Clarifying questions about what I need are welcome as I realize I might not have explained it in the most clear fashion.

Final Edit

I've found the best way to find the number of triples by reevaluating what i needed it to do. This method doesn't actually find the triples, it just counts them.

def foo(l):
    llen = len(l)
    total = 0
    cache = {}
    for i in range(llen):
        cache[i] = 0
    for x in range(llen):
        for y in range(x + 1, llen):
            if l[y] % l[x] == 0:
                cache[y] += 1
                total += cache[x]
    return total

And here's a version of the function that explains the thought process as it goes (not good for huge lists though because of spam prints):

def bar(l):
    list_length = len(l)
    total_triples = 0
    cache = {}
    for i in range(list_length):
        cache[i] = 0
    for x in range(list_length):
        print("\n\nfor index[{}]: {}".format(x, l[x]))
        for y in range(x + 1, list_length):
            print("\n\ttry index[{}]: {}".format(y, l[y]))
            if l[y] % l[x] == 0:
                print("\n\t\t{} can be evenly diveded by {}".format(l[y], l[x]))
                cache[y] += 1
                total_triples += cache[x]
                print("\t\tcache[{0}] is now {1}".format(y, cache[y]))
                print("\t\tcount is now {}".format(total_triples))
                print("\t\t(+{} from cache[{}])".format(cache[x], x))
            else:
                print("\n\t\tfalse")
    print("\ntotal number of triples:", total_triples)

Upvotes: 0

Views: 80

Answers (1)

vishes_shell
vishes_shell

Reputation: 23554

If i get you right:

from itertools import combinations

num_dict = {
    (2, 3): [(2, 2), (4, 4), (4, 5)],
    (2, 2): [(2, 3), (4, 4), (4, 5)],
    (4, 5): [(4, 4)],
    (1, 0): [(1, 1), (2, 2), (2, 3), (4, 4), (4, 5)],
    (4, 4): [(4, 5)],
    (1, 1): [(1, 0), (2, 2), (2, 3), (4, 4), (4, 5)]
    }
set(combinations([k[0] for k in num_dict.keys()], 3))

Output:

{(1, 4, 1),
 (2, 1, 1),
 (2, 1, 4),
 (2, 2, 1),
 (2, 2, 4),
 (2, 4, 1),
 (2, 4, 4),
 (4, 1, 1),
 (4, 1, 4),
 (4, 4, 1)}

And len() is 10

So basically what you're would do, make all combinations with itertools.combinations, from first elements of dict keys with a length 3 and then get set to eliminate repeating elements.

UPDATE

Since you updated the question with desired output data

You can do the following

from itertools import combinations_with_replacement
list(combinations_with_replacement(set([k[0] for k in num_dict.keys()]), 3))

Output:

[(1, 1, 1),
 (1, 1, 2),
 (1, 1, 4),
 (1, 2, 2),
 (1, 2, 4),
 (1, 4, 4),
 (2, 2, 2),
 (2, 2, 4),
 (2, 4, 4),
 (4, 4, 4)]

UPD2

So about time consumption i've ran it

num_dict = {
    (2, 3): [(2, 2), (4, 4), (4, 5)],
    (2, 2): [(2, 3), (4, 4), (4, 5)],
    (4, 5): [(4, 4)],
    (1, 0): [(1, 1), (2, 2), (2, 3), (4, 4), (4, 5)],
    (4, 4): [(4, 5)],
    (1, 1): [(1, 0), (2, 2), (2, 3), (4, 4), (4, 5)]
    }
def a(num_dict):
    ans_set = set()
    for x in num_dict:
        for y in num_dict[x]:
            for z in num_dict[y]:
                ans_set.add((x[0], y[0], z[0]))
    return len(ans_set)
def b(num_dict):
    from itertools import combinations_with_replacement
    return len(list(combinations_with_replacement(set([k[0] for k in num_dict.keys()]), 3)))
%timeit a(num_dict)
%timeit b(num_dict)

And the the results are:

The slowest run took 4.90 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 12.1 µs per loop

The slowest run took 5.37 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.77 µs per loop

So solution that i've presented here is 2x times faster.

Upvotes: 1

Related Questions