Reputation: 430
Been reading a long time, first time I have not been able to find an answer to something I'm working on.
I have a list of 93 strings which are each 6 characters long. From those 93 strings, I want to identify a set of 20 which all meet a particular criteria relative to the others in the set. While itertools.combinations will give me all possible combinations, not all conditions are worth checking.
For instance if [list[0], list[1], etc] fails because list[0] and list[1] can not be together it doesn't matter what the other 18 strings are, the set will fail every time, and that is a ton of wasted checking.
Currently I have this working with 20 nested for loops, but it seems like there has to be a better/faster way to do it.:
for n1 in bclist:
building = [n1]
n2bclist = [bc for bc in bclist if bc not in building]
for n2 in n2bclist: #this is the start of what gets repeated 19 times
building.append(n2)
if test_function(building): #does set fail? (counter intuitive, True when fail, False when pass)
building.remove(n2)
continue
n3bclist = [bc for bc in bclist if bc not in building]
#insert the additional 19 for loops, with n3 in n3, n4 in n4, etc
building.remove(n2)
There are print statements in the 20th for loop to alert me if a set of 20 even exists. The for statements at least allow me to skip sets early when the single addition fails, but there is no memory of when larger combinations fail:
For instance [list[0], list[1]]
fails, so skip to [list[0], [list[2]]
which passes. Next is [list[0], list[2], list[1]]
which will fail because 0 and 1 are together again so it will move to [list[0], list[2], list[3]]
which may or not pass. My concern is that eventually it will also test:
[list[0], list[3], list[2]]
[list[2], list[0], list[3]]
[list[2], list[3], list[0]]
[list[3], list[0], list[2]]
[list[3], list[2], list[0]]
All of these combinations will have the same outcome as the previous ones. Basically I trade the devil of itertools.combinations testing all combinations of sets which I know fail because of early values which fail for the devil of for loops which treat order of values as a factor when I do not care about their order. Both methods significantly increase the time it will take for my code to complete.
Any ideas on how to get rid of the devils would be greatly appreciated.
Upvotes: 3
Views: 746
Reputation: 157484
You should be basing your solution on itertools.combinations
as that will take care of the ordering issue; short-circuiting filtering is relatively easy to solve.
Let's quickly review how to implement combinations
works; the easiest way is to take the nested-for-loop approach and convert it to recursive style:
def combinations(iterable, r):
pool = tuple(iterable)
for i in range(0, len(pool)):
for j in range(i + 1, len(pool)):
...
yield (i, j, ...)
Converted to recursive form:
def combinations(iterable, r):
pool = tuple(iterable)
def inner(start, k, acc):
if k == r:
yield acc
else:
for i in range(start, len(pool)):
for t in inner(i + 1, k + 1, acc + (pool[i], )):
yield t
return inner(0, 0, ())
Applying a filter is now easy:
def combinations_filterfalse(predicate, iterable, r):
pool = tuple(iterable)
def inner(start, k, acc):
if predicate(acc):
return
elif k == r:
yield acc
else:
for i in range(start, len(pool)):
for t in inner(i + 1, k + 1, acc + (pool[i], )):
yield t
return inner(0, 0, ())
Let's check this:
>>> list(combinations_filterfalse(lambda t: sum(t) % 2 == 1, range(5), 2))
[(0, 2), (0, 4), (2, 4)]
The actual implementation of itertools.combinations
listed in the documentation uses an iterative loop:
def combinations(iterable, r):
pool = tuple(iterable)
n = len(pool)
if r > n:
return
indices = range(r)
yield tuple(pool[i] for i in indices)
while True:
for i in reversed(range(r)):
if indices[i] != i + n - r:
break
else:
return
indices[i] += 1
for j in range(i+1, r):
indices[j] = indices[j-1] + 1
yield tuple(pool[i] for i in indices)
In order to fit in a predicate elegantly, it's necessary to reorder the loop slightly:
def combinations_filterfalse(predicate, iterable, r):
pool = tuple(iterable)
n = len(pool)
if r > n or predicate(()):
return
elif r == 0:
yield ()
return
indices, i = range(r), 0
while True:
while indices[i] + r <= i + n:
t = tuple(pool[k] for k in indices[:i+1])
if predicate(t):
indices[i] += 1
elif len(t) == r:
yield t
indices[i] += 1
else:
indices[i+1] = indices[i] + 1
i += 1
if i == 0:
return
i -= 1
indices[i] += 1
Checking again:
>>> list(combinations_filterfalse(lambda t: sum(t) % 2 == 1, range(5), 2))
[(0, 2), (0, 4), (2, 4)]
>>> list(combinations_filterfalse(lambda t: t == (1, 4), range(5), 2))
[(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (2, 3), (2, 4), (3, 4)]
>>> list(combinations_filterfalse(lambda t: t[-1] == 3, range(5), 2))
[(0, 1), (0, 2), (0, 4), (1, 2), (1, 4), (2, 4)]
>>> list(combinations_filterfalse(lambda t: False, range(5), 2))
[(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
>>> list(combinations_filterfalse(lambda t: False, range(5), 0))
[()]
It turns out that the recursive solution is not only simpler but is also faster:
In [33]: timeit list(combinations_filterfalse_rec(lambda t: False, range(20), 5))
10 loops, best of 3: 24.6 ms per loop
In [34]: timeit list(combinations_filterfalse_it(lambda t: False, range(20), 5))
10 loops, best of 3: 76.6 ms per loop
Upvotes: 0
Reputation: 9868
You can make use of two aspects of the problem:
test_function(L)
is True
then test_function
of any sub-list of L
will also be True
You can also simplify things a bit by dealing with the indices 0-92 instead of list[0]
-list[92]
- it's only within test_function
that we might care what the contents of the list are.
The following code does that by first finding viable pairs, then sets of four, sets of eight, and sets of sixteen. Finally it finds all viable combinations of a sixteen and a four to get your list of 20. However there were over 100,000 sets of eight and so it was still way too slow and I gave up. Possibly you can do something along the same lines but speed it up with itertools
, but probably not enough.
target = range(5, 25)
def test_function(L):
for i in L:
if not i in target:
return True
def possible_combos(A, B):
"""
Find all possible pairings of a list within A and a list within B
"""
R = []
for i in A:
for j in B:
if i[-1] < j[0] and not test_function(i + j):
R.append(i + j)
return R
def possible_doubles(A):
"""
Find all possible pairings of two lists within A
"""
R = []
for n, i in enumerate(A):
for j in A[n + 1:]:
if i[-1] < j[0] and not test_function(i + j):
R.append(i + j)
return R
# First, find all pairs that are okay
L = range(92)
pairs = []
for i in L:
for j in L[i + 1:]:
if not test_function([i, j]):
pairs.append([i, j])
# Then, all pairs of pairs
quads = possible_doubles(pairs)
print "fours", len(quads), quads[0]
# Then all sets of eight, and sixteen
eights = possible_doubles(quads)
print "eights", len(eights), eights[0]
sixteens = possible_doubles(eights)
print "sixteens", len(sixteens), sixteens[0]
# Finally check all possible combinations of a sixteen plus a four
possible_solutions = possible_combos(sixteens, fours)
print len(possible_solutions), possible_solutions[0]
EDIT: I found a much better solution. First, identify all pairs of values within the range (0-92) that conform with test_function
, keeping the pairs in order. Presumably the 1st value of the 1st pair must be the 1st value of the solution, and the 2nd value of the last pair must be the last value of the solution (but check... is this assumption true for test_function
? If this isn't a safe assumption then you will need to repeat find_paths
for all the possible values of the start and finish). Then find a path from the 1st to last value that is 20 values long and also conforms to test_function
.
def test_function(S):
for i in S:
if not i in target:
return True
return False
def find_paths(p, f):
""" Find paths from end of p to f, check they are the right length,
and check they conform to test_function
"""
successful = []
if p[-1] in pairs_dict:
for n in pairs_dict[p[-1]]:
p2 = p + [n]
if (n == f and len(p2) == target_length and
not test_function(p2)):
successful.append(p2)
else:
successful += find_paths(p2, f)
return successful
list_length = 93 # this is the number of possible elements
target = [i * 2 for i in range(5, 25)]
# ^ this is the unknown target list we're aiming for...
target_length = len(target) # ... we only know its length
L = range(list_length - 1)
pairs = []
for i in L:
for j in L[i + 1:]:
if not test_function([i, j]):
pairs.append([i, j])
firsts = [a for a, b in pairs]
nexts = [[b for a, b in pairs if a == f] for f in firsts]
pairs_dict = dict(zip(firsts, nexts))
print "Found solution(s):", find_paths([pairs[0][0]], pairs[-1][1])
Upvotes: 0
Reputation: 500883
def gen(l, n, test, prefix=()):
if n == 0:
yield prefix
else:
for i, el in enumerate(l):
if not test(prefix + (el,)):
for sub in gen(l[i+1:], n - 1, test, prefix + (el,)):
yield sub
def test(l):
return sum(l) % 3 == 0 # just a random example for testing
print list(gen(range(5), 3, test))
This will choose subsets of cardinality n
from l
such that test(subset) == False
.
It tries to avoid unnecessary work. However, given that there are 1e20 ways to choose 20 elements out of 93, you may need to rethink your overall approach.
Upvotes: 1
Reputation: 208655
Use your current method, but keep track of indices as well so that in your inner loops you can skip elements you would have already checked:
bcenum = list(enumerate(bclist))
for i1, n1 in bcenum:
building = [n1]
for i2, n2 in bcenum[i1+1:]: #this is the start of what gets repeated 19 times
building.append(n2)
if test_function(building): #does set fail? (counter intuitive, True when fail, False when pass)
building.remove(n2)
continue
for i3, n3 in bcenum[i2+1:]:
# more nested loops
building.remove(n2)
Upvotes: 1