Majid
Majid

Reputation: 261

Iterate over all lists with one duplicate

Consider all lists of length n where the values are integers from the range 0 to n-1. I would like to iterate as quickly as possible over every possible such list that has exactly one duplicate. If n = 2 then the possible lists are:

00
11

If n = 3 they are:

001
002
110
112
220
221
100
200
011
211
022
122
010
020
101
121
202
212

The total number of such lists is n! * (n choose 2) so I would like to avoid storing them all in memory if at all possible.

For each list I want to call a function that calculates a function of the list.
What's a good way to do this?

Benchmarks On my computer I get the following benchmark results

timeit all(one_duplicate(6))
100 loops, best of 3: 6.87 ms per loops
timeit all(one_duplicate(7))
10 loops, best of 3: 59 ms per loop
timeit all(one_duplicate(8))
1 loops, best of 3: 690 ms per loop
timeit all(one_duplicate(9))
1 loops, best of 3: 7.58 s per loop

and

timeit all(duplicates(range(6)))
10 loops, best of 3: 22.8 ms per loop
timeit all(duplicates(range(7)))
1 loops, best of 3: 416 ms per loop    
timeit all(duplicates(range(8)))
1 loops, best of 3: 9.78 s per loop

and

timeit all(all_doublet_tuples(6))
100 loops, best of 3: 6.36 ms per loop
timeit all(all_doublet_tuples(7))
10 loops, best of 3: 60 ms per loop
timeit all(all_doublet_tuples(8))
1 loops, best of 3: 542 ms per loop
timeit all(all_doublet_tuples(9))
1 loops, best of 3: 7.27 s per loop

I declare all_doublet_tuples and one_duplicate to be equal first at the moment (allowing for noise in the tests).

Upvotes: 4

Views: 184

Answers (5)

fsw
fsw

Reputation: 3695

import itertools
n=3
for i in range(n-1):
    for j in range(n-i-1):
        for perm in map(list, itertools.permutations(range(n), n-1)):
            perm.insert(i, perm[i+j])
            print perm

key concept here is to generate all N-1 length permutations from a N-length set and then just run it over all repetition combinations (i and i+j)

In case of N=3, permutations are:

(0, 1) (0, 2) (1, 0) (1, 2) (2, 0) (2, 1)

and repeated elements can be at positions marked by X:

XX_, X_X, _XX

Your results is the 'multiplication' of those sets:

        XX_  X_X  _XX
(0, 1)  001  010  011
(0, 2)  002  020  022
(1, 0)  110  101  100
(1, 2)  112  121  122
(2, 0)  220  202  200
(2, 1)  221  212  211

Everything is computed on the fly. I guess memory consumption is O(N).

Upvotes: 2

pemistahl
pemistahl

Reputation: 9584

from itertools import product    

def duplicates(r):
    r_len = len(r)
    dup_len = r_len - 1
    for tup in product(r, repeat=r_len):
        if len(set(tup)) == dup_len:
            yield tup

>>> list(duplicates(range(2)))
[(0, 0), (1, 1)]

>>> list(duplicates(range(3)))
[(0, 0, 1),
 (0, 0, 2),
 (0, 1, 0),
 (0, 1, 1),
 (0, 2, 0),
 (0, 2, 2),
 (1, 0, 0),
 (1, 0, 1),
 (1, 1, 0),
 (1, 1, 2),
 (1, 2, 1),
 (1, 2, 2),
 (2, 0, 0),
 (2, 0, 2),
 (2, 1, 1),
 (2, 1, 2),
 (2, 2, 0),
 (2, 2, 1)]

Some benchmarks on my machine:

%timeit all(duplicates(range(5)))
100 loops, best of 3: 2.5 ms per loop

%timeit all(duplicates(range(6)))
10 loops, best of 3: 38.6 ms per loop

%timeit all(duplicates(range(7)))
1 loops, best of 3: 766 ms per loop

%timeit all(duplicates(range(8)))
1 loops, best of 3: 18.8 s per loop

Upvotes: 1

tzelleke
tzelleke

Reputation: 15345

Think of any of the desired output tuples in this way: It was derived by duplicating the first (lets say from the left) doublet elem and inserting it at the position of the second doublet.

Therefore my solution is basically this two-step process:

  1. generate all n-1 permutations of range(n)
  2. for each tuple from 1.:
    take each single element and insert it at each position ahead (to avoid duplicates) including at the very end

import itertools as it

# add_doublet accomplishes step 2
def add_doublet(t):
    for i in range(len(t)):
        for j in range(i+1, len(t)+1):
            yield t[:j] + (t[i],) + t[j:]


def all_doublet_tuples(n):
    for unique_tuple in it.permutations(range(n), n-1):
        for doublet_tuple in add_doublet(unique_tuple):
            yield doublet_tuple



from pprint import pprint

n = 3
pprint(list(all_doublet_tuples(n)))

I do not print the output here cause its kind of lenghty ... and the case for n=3 is boring.

Upvotes: 5

Andrew Clark
Andrew Clark

Reputation: 208395

import itertools

def one_duplicate(n):
    nrange = list(range(n))
    for x in nrange:
        without_x = nrange[:x] + nrange[x+1:]
        for i, j in itertools.combinations(nrange, 2):
            for others in map(list, itertools.permutations(without_x, n-2)):
                others.insert(i, x)
                others.insert(j, x)
                yield others

>>> list(one_duplicate(2))
[[0, 0], [1, 1]]
>>> list(one_duplicate(3))
[[0, 0, 1], [0, 0, 2], [0, 1, 0], [0, 2, 0], [1, 0, 0], [2, 0, 0], [1, 1, 0], [1, 1, 2], [1, 0, 1], [1, 2, 1], [0, 1, 1], [2, 1, 1], [2, 2, 0], [2, 2, 1], [2, 0, 2], [2, 1, 2], [0, 2, 2], [1, 2, 2]]
>>> list(one_duplicate(4))
[[0, 0, 1, 2], [0, 0, 1, 3], [0, 0, 2, 1], [0, 0, 2, 3], [0, 0, 3, 1], [0, 0, 3, 2], [0, 1, 0, 2], [0, 1, 0, 3], [0, 2, 0, 1], [0, 2, 0, 3], [0, 3, 0, 1], [0, 3, 0, 2], [0, 1, 2, 0], [0, 1, 3, 0], [0, 2, 1, 0], [0, 2, 3, 0], [0, 3, 1, 0], [0, 3, 2, 0], [1, 0, 0, 2], [1, 0, 0, 3], [2, 0, 0, 1], [2, 0, 0, 3], [3, 0, 0, 1], [3, 0, 0, 2], [1, 0, 2, 0], [1, 0, 3, 0], [2, 0, 1, 0], [2, 0, 3, 0], [3, 0, 1, 0], [3, 0, 2, 0], [1, 2, 0, 0], [1, 3, 0, 0], [2, 1, 0, 0], [2, 3, 0, 0], [3, 1, 0, 0], [3, 2, 0, 0], [1, 1, 0, 2], [1, 1, 0, 3], [1, 1, 2, 0], [1, 1, 2, 3], [1, 1, 3, 0], [1, 1, 3, 2], [1, 0, 1, 2], [1, 0, 1, 3], [1, 2, 1, 0], [1, 2, 1, 3], [1, 3, 1, 0], [1, 3, 1, 2], [1, 0, 2, 1], [1, 0, 3, 1], [1, 2, 0, 1], [1, 2, 3, 1], [1, 3, 0, 1], [1, 3, 2, 1], [0, 1, 1, 2], [0, 1, 1, 3], [2, 1, 1, 0], [2, 1, 1, 3], [3, 1, 1, 0], [3, 1, 1, 2], [0, 1, 2, 1], [0, 1, 3, 1], [2, 1, 0, 1], [2, 1, 3, 1], [3, 1, 0, 1], [3, 1, 2, 1], [0, 2, 1, 1], [0, 3, 1, 1], [2, 0, 1, 1], [2, 3, 1, 1], [3, 0, 1, 1], [3, 2, 1, 1], [2, 2, 0, 1], [2, 2, 0, 3], [2, 2, 1, 0], [2, 2, 1, 3], [2, 2, 3, 0], [2, 2, 3, 1], [2, 0, 2, 1], [2, 0, 2, 3], [2, 1, 2, 0], [2, 1, 2, 3], [2, 3, 2, 0], [2, 3, 2, 1], [2, 0, 1, 2], [2, 0, 3, 2], [2, 1, 0, 2], [2, 1, 3, 2], [2, 3, 0, 2], [2, 3, 1, 2], [0, 2, 2, 1], [0, 2, 2, 3], [1, 2, 2, 0], [1, 2, 2, 3], [3, 2, 2, 0], [3, 2, 2, 1], [0, 2, 1, 2], [0, 2, 3, 2], [1, 2, 0, 2], [1, 2, 3, 2], [3, 2, 0, 2], [3, 2, 1, 2], [0, 1, 2, 2], [0, 3, 2, 2], [1, 0, 2, 2], [1, 3, 2, 2], [3, 0, 2, 2], [3, 1, 2, 2], [3, 3, 0, 1], [3, 3, 0, 2], [3, 3, 1, 0], [3, 3, 1, 2], [3, 3, 2, 0], [3, 3, 2, 1], [3, 0, 3, 1], [3, 0, 3, 2], [3, 1, 3, 0], [3, 1, 3, 2], [3, 2, 3, 0], [3, 2, 3, 1], [3, 0, 1, 3], [3, 0, 2, 3], [3, 1, 0, 3], [3, 1, 2, 3], [3, 2, 0, 3], [3, 2, 1, 3], [0, 3, 3, 1], [0, 3, 3, 2], [1, 3, 3, 0], [1, 3, 3, 2], [2, 3, 3, 0], [2, 3, 3, 1], [0, 3, 1, 3], [0, 3, 2, 3], [1, 3, 0, 3], [1, 3, 2, 3], [2, 3, 0, 3], [2, 3, 1, 3], [0, 1, 3, 3], [0, 2, 3, 3], [1, 0, 3, 3], [1, 2, 3, 3], [2, 0, 3, 3], [2, 1, 3, 3]]

Here is a description of the algorithm:

  • Find all the index pairs using itertools.combinations(nrange, 2)
  • For each number x in the range, get all permutations of length n-2 for the entire range except for x
  • In each of these permutations, insert x at each of the indices found in the first step and yield the result.

Timing comparisons between my answer and Peter Stahl's:

# Just showing they both get the same result
In [23]: set(peter(range(6))) == set(tuple(x) for x in fj(6))
Out[23]: True

In [24]: %timeit list(peter(range(6)))
10 loops, best of 3: 27.3 ms per loop

In [25]: %timeit list(fj(6))
100 loops, best of 3: 8.32 ms per loop

In [26]: %timeit list(peter(range(7)))
1 loops, best of 3: 506 ms per loop

In [27]: %timeit list(fj(7))
10 loops, best of 3: 81 ms per loop

Upvotes: 3

abarnert
abarnert

Reputation: 365587

Your questions are about how to do so without creating an actual list in memory, and how to run a function on each one, so I'll answer that part generically first, then for your specific problem.

The total number of possible lists is n*(n-1) * (n choose 2) so I would like to avoid storing them all in memory if at all possible.

You just need to create a generator function, which yields answers one by one, instead of a regular function that appends each answer to a list and then returns that list.

For each list I want to call a function that calculates a function of the list. What's a good way to do this?

You can call a generator function's and use the result as an iterator. For example:

for element in my_generator_function(n):
    my_function(element)

If you want the results of calling my_function on each element, you can build another generator function:

def apply_to(my_function, my_iterator):
    for element in my_iterator):
        yield my_function(element)

However, there's already a built-in function that does exactly that, called map in Python 3, or itertools.imap in Python 2.

Or, even more simply, you can use a generator expression:

results = (my_function(element) for element in my_generator_function(n))

For more, see the Iterators, Generators, and Generator Expressions section in the official Python tutorial.

Now, you can break your problem down like this:

  • For each number in the set of the first n numbers:
    • All permutations of that number twice, together with n-2 of the other numbers

For n=2, this means all permutations of (0, 0) together with 0 numbers from (1), and all permutations of (1, 1) together with 0 numbers from (0). For n=3, you've got all permutations of (0, 0) together with 1 of the numbers (1, 2), and all permutations of (1, 1) together with 1 of the numbers (0, 2), and all permutations of (2, 2) together with 1 of the numbers (0, 1). And so on.

So, translating that directly to Python, using the magic of itertools:

def one_duplicate_element(n):
    for i in range(n):
        rest = (j for j in range(n) if j != i)
        for combination in itertools.combinations(rest, n-2):
            yield from itertools.permutations(combination + (i, i))

This uses the yield from statement in Python 3.3, which means "yield every element from this iterator". If you're using 3.2 or earlier, you need to do that explicitly with a loop:

def one_duplicate_element(n):
    for i in range(n):
        rest = (j for j in range(n) if j != i)
        for combination in itertools.combinations(rest, n-2):
            for permutation in itertools.permutations(combination + (i, i)):
                yield permutation

But you'll notice that, while this does exactly what you asked for, it doesn't generate the same list as your sample output. Even ignoring ordering (which I don't think is relevant for you), there are duplicates. Why?

Well, what are the permutations of the set (0, 0)? That's a trick question; (0, 0) can't be a set, because it has duplicates in it. So, whatever (0, 0) is, what are its permutations? Well, by the usual definition of permutations on ordered tuples, what would be ((0, 0), (0, 0)). And that's exactly what itertools.permutations gives you.

You can look at how permutations works and write a permutations_without_duplicates function yourself, or you can use the unique_everseen function in the Recipes section of the docs, or you can sort the results and use the unique_justseen, or… Well, let's pick one:

def one_duplicate_element_no_duplicates(n):
    yield from unique_everseen(one_duplicate_element(n))

Upvotes: 1

Related Questions