Reputation: 940
I've got this dictionary:
num_dict = {
(2, 3): [(2, 2), (4, 4), (4, 5)],
(2, 2): [(2, 3), (4, 4), (4, 5)],
(4, 5): [(4, 4)],
(1, 0): [(1, 1), (2, 2), (2, 3), (4, 4), (4, 5)],
(4, 4): [(4, 5)],
(1, 1): [(1, 0), (2, 2), (2, 3), (4, 4), (4, 5)],
}
I need to find the max number of 3 long combinations of the first values of each of these tuples, where only the values of each key can proceed said key.
My current code for finding all unique (3 long) combinations is this:
ans_set = set()
for x in num_dict:
for y in num_dict[x]:
for z in num_dict[y]:
ans_set.add((x[0], y[0], z[0]))
return len(ans_set)
This returns 10
and ans_set
ends up being:
{
(2, 2, 2), (1, 2, 2), (1, 4, 4),
(2, 2, 4), (1, 1, 2), (4, 4, 4),
(1, 2, 4), (1, 1, 4), (1, 1, 1),
(2, 4, 4)
}
But I don't actually care about what the sets are, just the number of them
This method is not particularly efficient as it actually generates every possible combination and puts it in a set.
I don't need to know each unique combination, I just need to know how many there are.
I have a feeling this can be done, maybe using the lengths of the value lists? but I am having trouble wrapping my head around it.
Clarifying questions about what I need are welcome as I realize I might not have explained it in the most clear fashion.
I've found the best way to find the number of triples by reevaluating what i needed it to do. This method doesn't actually find the triples, it just counts them.
def foo(l):
llen = len(l)
total = 0
cache = {}
for i in range(llen):
cache[i] = 0
for x in range(llen):
for y in range(x + 1, llen):
if l[y] % l[x] == 0:
cache[y] += 1
total += cache[x]
return total
And here's a version of the function that explains the thought process as it goes (not good for huge lists though because of spam prints):
def bar(l):
list_length = len(l)
total_triples = 0
cache = {}
for i in range(list_length):
cache[i] = 0
for x in range(list_length):
print("\n\nfor index[{}]: {}".format(x, l[x]))
for y in range(x + 1, list_length):
print("\n\ttry index[{}]: {}".format(y, l[y]))
if l[y] % l[x] == 0:
print("\n\t\t{} can be evenly diveded by {}".format(l[y], l[x]))
cache[y] += 1
total_triples += cache[x]
print("\t\tcache[{0}] is now {1}".format(y, cache[y]))
print("\t\tcount is now {}".format(total_triples))
print("\t\t(+{} from cache[{}])".format(cache[x], x))
else:
print("\n\t\tfalse")
print("\ntotal number of triples:", total_triples)
Upvotes: 0
Views: 80
Reputation: 23554
If i get you right:
from itertools import combinations
num_dict = {
(2, 3): [(2, 2), (4, 4), (4, 5)],
(2, 2): [(2, 3), (4, 4), (4, 5)],
(4, 5): [(4, 4)],
(1, 0): [(1, 1), (2, 2), (2, 3), (4, 4), (4, 5)],
(4, 4): [(4, 5)],
(1, 1): [(1, 0), (2, 2), (2, 3), (4, 4), (4, 5)]
}
set(combinations([k[0] for k in num_dict.keys()], 3))
Output:
{(1, 4, 1),
(2, 1, 1),
(2, 1, 4),
(2, 2, 1),
(2, 2, 4),
(2, 4, 1),
(2, 4, 4),
(4, 1, 1),
(4, 1, 4),
(4, 4, 1)}
And len()
is 10
So basically what you're would do, make all combinations with itertools.combinations
, from first elements of dict keys with a length 3 and then get set
to eliminate repeating elements.
UPDATE
Since you updated the question with desired output data
You can do the following
from itertools import combinations_with_replacement
list(combinations_with_replacement(set([k[0] for k in num_dict.keys()]), 3))
Output:
[(1, 1, 1),
(1, 1, 2),
(1, 1, 4),
(1, 2, 2),
(1, 2, 4),
(1, 4, 4),
(2, 2, 2),
(2, 2, 4),
(2, 4, 4),
(4, 4, 4)]
UPD2
So about time consumption i've ran it
num_dict = {
(2, 3): [(2, 2), (4, 4), (4, 5)],
(2, 2): [(2, 3), (4, 4), (4, 5)],
(4, 5): [(4, 4)],
(1, 0): [(1, 1), (2, 2), (2, 3), (4, 4), (4, 5)],
(4, 4): [(4, 5)],
(1, 1): [(1, 0), (2, 2), (2, 3), (4, 4), (4, 5)]
}
def a(num_dict):
ans_set = set()
for x in num_dict:
for y in num_dict[x]:
for z in num_dict[y]:
ans_set.add((x[0], y[0], z[0]))
return len(ans_set)
def b(num_dict):
from itertools import combinations_with_replacement
return len(list(combinations_with_replacement(set([k[0] for k in num_dict.keys()]), 3)))
%timeit a(num_dict)
%timeit b(num_dict)
And the the results are:
The slowest run took 4.90 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 12.1 µs per loop
The slowest run took 5.37 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.77 µs per loop
So solution that i've presented here is 2x times faster.
Upvotes: 1