Reputation: 165
I'm working with a bit of a riddle:
Given a dictionary with tuples for keys: dictionary = {(p,q):n}
, I need to generate a list of new dictionaries of every combination such that neither p nor q repeat within the new dictionary. And during the generation of this list of dictionaries, or after, pick one of the dictionaries as the desired one based on a calculation using the dictionary values.
example of what I mean (but much smaller):
dictionary = {(1,1): 1.0, (1,2): 2.0, (1,3): 2.5, (1,4): 5.0, (2,1): 3.5, (2,2): 6.0, (2,3): 4.0, (2,4): 1.0}
becomes
listofdictionaries = [{(1,1): 1.0, (2,2): 6.0}, {(1,1): 1.0, (2,3): 4.0}, (1,1): 1.0, (2,4): 1.0}, {(1,2): 2.0, (2,1): 3.5}, {(1,2): 2.0, (2,3): 4.0},
etc.
a dictionary like: {(1,1): 1.0, (2,1): 3.5}
is not allowable because q repeats.
Now my sob story: I'm brand new to coding... but I've been trying to write this script to analyze some of my data. But I also think it's an interesting algorithm riddle. I wrote something that works with very small dictionaries but when I input a large one, it takes way too long to run (copied below). In my script attempt, I actually generated a list of combinations of tuples instead that I use to refer to my master dictionary later on in the script. I'll copy it below:
The dictionary tuple keys were generated using two lists: "ExpList1" and "ExpList2"
#first, I generate all the tuple combinations from my ExpDict dictionary
combos =(itertools.combinations(ExpDict,min(len(ExpList1),len(ExpList2))))
#then I generate a list of only the combinations that don't repeat p or q
uniquecombolist = []
for foo in combos:
counter = 0
listofp = []
listofq = []
for bar in foo:
if bar[0] in listofp or bar[1] in listofq:
counter=+1
break
else:
listofp.append(bar[0])
listofq.append(bar[1])
if counter == 0:
uniquecombolist.append(foo)
After generating this list, I apply a function to all of the dictionary combinations (iterating through the tuple lists and calling their respective values from the master dictionary) and pick the combination with the smallest resulting value from that function.
I also tried to apply the function while iterating through the combinations picking the unique p,q ones and then checking whether the resulting value is smaller than the previous and keeping it if it is (this is instead of generating that list "uniquecombolist", I end up generating just the final tuple list) - still takes too long.
I think the solution lies in embedding the p,q-no-repeat and the final selecting function DURING the generation of combinations. I'm just having trouble wrapping my head around how to actually do this.
Thanks for reading! Sara
EDIT:
To clarify, I wrote an alternative to my code that incorporates the final function (basically root mean squares) to the sets of pairs.
`combos =(itertools.combinations(ExpDict,min(len(ExpList1),len(ExpList2))))
prevRMSD = float('inf')
for foo in combos:
counter = 0
distanceSUM = 0
listofp = []
listofq = []
for bar in foo:
if bar[0] in listofp or bar[1] in listofq:
counter=+1
break
else:
listofp.append(bar[0])
listofq.append(bar[1])
distanceSUM = distanceSUM + RMSDdict[bar]
RMSD = math.sqrt (distanceSUM**2/len(foo))
if counter == 0 and RMSD< prevRMSD:
chosencombo = foo
prevRMSD = RMSD`
So if I could incorporate the RMS calculation during the set generation and only keep the smallest one, I think that will solve my combinatorial problem.
Upvotes: 9
Views: 3849
Reputation: 9008
This answer assume that you are trying to generate sets with |S| elements, where S is the smaller pool of tuple coordinates. The larger pool will be denoted L.
Since the set will contain |S| pairs with no repeated elements, each element from S must occur exactly once. From here, match up the permutations of L where |S| elements are chosen with the ordered elements of S. This will generate all requested sets exhaustively and without repetition.
Note that P(|L|, |S|) is equal to |L|!/(|L|-|S|)!
Depending on the sizes of the tuple coordinate pools, there may be too many permutations to enumerate.
Some code to replicate this enumeration might look like:
from itertools import permutations
S, L = range(2), range(4) # or ExpList1, ExpList2
for p in permutations(L, len(S)):
print(zip(S, p))
In total, your final code might look something like:
S, L = ExpList1, ExpList2
pairset_maker = lambda p: zip(S, p)
if len(S) > len(L):
S, L = L, S
pairset_maker = lambda p: zip(p, S)
n = len(S)
get_perm_value = lambda p: math.sqrt(sum(RMSDdict[t] for t in pairset_maker(p))**2/n)
min_pairset = min(itertools.permutations(L, n), key=get_perm_value)
If this doesn't get you to within an order or magnitude or two of your desired runtime, then you might need to consider an algorithm that doesn't produce an optimal solution.
Upvotes: 1
Reputation: 1323
If I understood your problem, you are interested in all the possible combinations of pairs (p,q) with unique p's and q's respecting a given set of possible values for p's and q's. In my answer I assume those possible values are, respectively, in list_p
and list_q
(I think this is what you have in ExpList1
and ExpList2
am I right?)
min_size = min(len(list_p), len(list_q))
combos_p = itertools.combinations(list_p, min_size)
combos_q = itertools.permutations(list_q, min_size)
prod = itertools.product(combos_p, combos_q)
uniquecombolist = [tuple(zip(i[0], i[1])) for i in prod]
Let me know if this is what you're looking for. By the way welcome to SO, great question!
Edit:
If you're concerned that your list may become enormous, you can always use a generator expression and apply whatever function you desire to it, e.g.,
min_size = min(len(list_p), len(list_q))
combos_p = itertools.combinations(list_p, min_size)
combos_q = itertools.permutations(list_q, min_size)
prod = itertools.product(combos_p, combos_q)
uniquecombo = (tuple(zip(y[0], y[1])) for y in prod) # this is now a generator expression, not a list -- observe the parentheses
def your_function(x):
# do whatever you want with the values, here I'm just printing and returning
print(x)
return x
# now prints the minimum value
print(min(itertools.imap(your_function, uniquecombo)))
When you use generators instead of lists, the values are computed as they are needed. Here since we're interested in the minimum value, each value is computed and is discarded right away unless it is the minimum.
Upvotes: 1