Reputation: 48416

How to get the Cartesian product of multiple lists

How can I get the Cartesian product (every possible combination of values) from a group of lists?

For example, given

somelists = [
   [1, 2, 3],
   ['a', 'b'],
   [4, 5]
]

How do I get this?

[(1, 'a', 4), (1, 'a', 5), (1, 'b', 4), (1, 'b', 5), (2, 'a', 4), (2, 'a', 5), ...]

_{One common application for this technique is to avoid deeply nested loops. See Avoiding nested for loops for a more specific duplicate. Similarly, this technique might be used to "explode" a dictionary with list values; see Combine Python Dictionary Permutations into List of Dictionaries .}

_{If you want a Cartesian product of the same list with itself multiple times, itertools.product can handle that elegantly. See Operation on every pair of element in a list or How can I get "permutations with repetitions" from a list (Cartesian product of a list with itself)?.}

_{Many people who already know about itertools.product struggle with the fact that it expects separate arguments for each input sequence, rather than e.g. a list of lists. The accepted answer shows how to handle this with *. However, the use of * here to unpack arguments is fundamentally not different from any other time it's used in a function call. Please see Expanding tuples into arguments for this topic (and use that instead to close duplicate questions, as appropriate).}

Upvotes: 543

Answers (20)

Jai

Reputation: 3310

Recursive Approach:

def rec_cart(start, array, partial, results):
  if len(partial) == len(array):
    results.append(partial)
    return 

  for element in array[start]:
    rec_cart(start+1, array, partial+[element], results)

rec_res = []
some_lists = [[1, 2, 3], ['a', 'b'], [4, 5]]  
rec_cart(0, some_lists, [], rec_res)
print(rec_res)

Iterative Approach:

def itr_cart(array):
  results = [[]]
  for i in range(len(array)):
    temp_res = []
    for res in results:
      for element in array[i]:
        temp_res.append(res+[element])
    results = temp_res
  
  return results

some_lists = [[1, 2, 3], ['a', 'b'], [4, 5]]  
itr_res = itr_cart(some_lists)
print(itr_res)

Upvotes: 10

Hans

Reputation: 520

This code here is 6 times faster than itertools.product and uses 10 times less memory.

It uses only very basic operators (programming class in middle school) and that is why it is so efficient.

Here is an explanation how I came up with it

Some background first:

Probably you learned binary numbers in school/uni and did a lot of exercises to get more comfortable in understanding this system. It is strange leading with a limited set of numbers, right? That means you have already created a lot of Cartesian products and have some good practice.

What method did you use to do that?

Maybe, your teacher told you to use mainly these 3 operators: / and % and * that was all what you needed (at least in my case, it was).

Let's take a look at a code and the corresponding result of a Cartesian product:

This code: product(range(0,5),range(0,7),range(0,8)) generates that: 


(0, 0, 0),
(0, 0, 1),
(0, 0, 2),
(0, 0, 3),
(0, 0, 4),
(0, 0, 5),
(0, 0, 6),
(0, 0, 7),
(0, 1, 0),
(0, 1, 1),
(0, 1, 2),
(0, 1, 3),
(0, 1, 4),
(0, 1, 5),
(0, 1, 6),
(0, 1, 7),
(0, 2, 0),
(0, 2, 1),
(0, 2, 2),
(0, 2, 3),
(0, 2, 4),
(0, 2, 5),
(0, 2, 6),
(0, 2, 7),
(0, 3, 0),
..

It looks almost the same, right? Almost like the binary table. When I noticed that, I got a feeling that there were great chances to write an algorithm using mainly / and % and * - no recursion, no concatenation of lists, no generators - nothing, just 7th grade computer basics ...
And it was surprisingly easy. The only difference there is compared to the binary data is that the base changes often [different length of input data] in every column. When the length of the first input iterator is reached [ range(0,8) ], the second column starts all of a sudden, but not with 8 like in our decimal system. It is 10 after 7, because the base of the first column is 8 [ range(0,8) ], . And there is nothing left after 7

(0, 0, 0),
(0, 0, 1),
(0, 0, 2),
(0, 0, 3),
(0, 0, 4),
(0, 0, 5),
(0, 0, 6),
(0, 0, 7),
(0, 1, 0),
(0, 1, 1),

Looking again at the code:

Now it becomes very clear what the bases of the other columns are. (If it's not that clear, use len() )

   (range(0,5),range(0,7),range(0,8))


And this is all what you need to get it going:

  from cython.parallel cimport prange
  cimport cython
  import numpy as np
  cimport numpy as np
  import cython
  ctypedef fused real:
      cython.char
  
      .... MORE DATA TYPES
      cython.longdoublecomplex
      cython.Py_hash_t
      cython.Py_UCS4
  cpdef void create_product(cython.Py_ssize_t liste, cython.Py_ssize_t[:] indlist, real[:] ctypesframedataresults, real[:] ear):
      cdef cython.Py_ssize_t geht,bleibt,zahlx,q,geht1,zahl
      cdef cython.Py_ssize_t indlistlen =len(indlist)
      for q in prange(liste,nogil=True):
          geht = q
          bleibt = 0
          for zahlx in range(indlistlen):
              zahl=indlist[zahlx]
              geht1 = geht // zahl # 8th grade - computer science class - the heart of the algorithm :)
              bleibt = geht % zahl
              geht = geht1
              ctypesframedataresults[q*indlistlen+zahlx] = ear[bleibt] # this was a little tricky,

And here is the Python code (more advanced than the algorithm :-) ):

    import functools
    import operator
    from collections import defaultdict
    from cythonproduct import create_product #import the compiled file
    import numpy as np
    
    def get_pointer_array(original):
        dty = np.ctypeslib.as_ctypes_type(original.dtype)
    
        b = original.ctypes.data
        buff = (dty * original.size).from_address(b)
    
        aflat = np.frombuffer(buff, dtype=original.dtype)
        return aflat
    
    def  cartesian_product(*args):
        lookup1 = {}
        lookup2 = defaultdict(list)
        i = 0
        # Create dictionaries for indexing and lookup during computation
        for arg in (args):
            for f in arg:
                lookup2[f].append(i)
                lookup1[i] = f
                i += 1
    
        indlist = np.array([len(x) for x in args],dtype=np.int64) # get the bases
        listexx = int(np.product(indlist)) # shape of output array (1 column)
        dty = np.array(list(lookup2.keys()))
        rightshape = np.array(functools.reduce(operator.add, list(lookup2.values())))
        bestdtype = (dtypecheck(dty, filterna=False, float2int=False, )) # optional - you can use your own
        # this part is used to create a numpy array that serves as a lookup dict (without hashing anything) - to that we can use all kind of values, not only 0,1,2,3,4,5...
        ear = np.zeros(rightshape.shape, dtype=bestdtype.dtype)
        for k, item in lookup1.items(): # with numpy arrays, we can disable the GIL.
            ear[k] = item # (Only a few elements - for is ok)
        dataresults = np.zeros((listexx, len(indlist)), dtype=ear.dtype)
        if not dataresults.flags['C_CONTIGUOUS']: # may not be necessary, but who knows …
            dataresults = np.ascontiguousarray(dataresults)
        # this is how you get a pointer. I don't know if there are better/safer ways (Strides maybe?)
        # Ctypes works, that is what counts. We can modify the flat array in Cython and see the results in the
        # ndim array. That way, we can pass anything to Cython. And  we can loop in Cython like
        # we would in Python (almost) using .... guess what: the Cartesian product. You can generate
        # the product on the Python side, pass the index array (always 2D) and use it as a loop index on the
        # Cython site, and no function signature will ever bother you.
        ctypesframedataresults = get_pointer_array(dataresults)
        # Let Cython do the work here and modify the memory directly. There is probably nothing in this world
        # the uses more memory than a Cartesian product :)
        create_product(listexx, indlist, ctypesframedataresults, ear)
        return dataresults
    
I posted an auto-compiling version on GitHub, in case that you want to use it, but don't know how to configure Cython: https://github.com/hansalemaos/cythoncartesian/blob/main/__init__.py

Benchmark



Benchmark: 

args2=[list(range(8)),list(range(8)),list(range(8)),list(range(8)),
      list(range(8)),list(range(8)),list(range(8)),list(range(8)),
      list(range(8))]

dataresults=cartesian_product(*args2)
      

# Mem usage 2 GB
# Out[4]:
# array([[0, 0, 0, ..., 0, 0, 0],
#        [1, 0, 0, ..., 0, 0, 0],
#        [2, 0, 0, ..., 0, 0, 0],
#        ...,
#        [5, 7, 7, ..., 7, 7, 7],
#        [6, 7, 7, ..., 7, 7, 7],
#        [7, 7, 7, ..., 7, 7, 7]], dtype=uint8)
# %timeit dataresults=cartesian_product(*args2)
# 2.65 s ± 163 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# dataresults.shape
# Out[6]: (134217728, 9)
      
      ```
## Itertools:

# itertools.product
# Mem usage 16 GB

# import itertools
# %timeit (list(itertools.product(*args2)))
# 11.5 s ± 203 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

By the way: it is also extremely fast without Cython, but I haven't done a benchmark yet.

Upvotes: -3

Eric Zinda

Reputation: 877

I like jack taylor 's implementation above because it is the only one so far that can be easily modified to stream the answers without materializing iterators passed in. So, if you are doing the product of a bunch of really big (or infinite, or expensive) iterators and you might stop before the end, you only have to materialize as much as you need.

Here's one approach to modifying it for that:

def product_stream(*args, repeat=1):
    """Find the Cartesian product of the arguments.

    The interface is identical to itertools.product.
    """
    def index_from_stream(array_stream, index):
        try:
            while index >= len(array_stream[0]):
                next_element = next(array_stream[1])
                array_stream[0].append(next_element)

            return True, array_stream[0][index]

        except StopIteration:
            return False, None

    # Initialize data structures and handle bad input
    if len(args) == 0:
        # Match behavior of itertools.product
        yield ()
        return

    gears = [([], arg) for arg in args] * repeat
    for gear in gears:
        if not index_from_stream(gear, 0)[0]:
            return

    tooth_numbers = [0] * len(gears)
    result = [index_from_stream(gear, 0)[1] for gear in gears]

    # Rotate through all gears
    last_gear_number = len(gears) - 1
    finished = False
    while not finished:
        yield tuple(result)

        # Get next result
        gear_number = last_gear_number
        while gear_number >= 0:
            gear = gears[gear_number]
            tooth_number = tooth_numbers[gear_number] + 1
            has_tooth, gear_tooth_value = index_from_stream(gear, tooth_number)
            if has_tooth:
                # No gear change is necessary, so exit the loop
                result[gear_number] = gear_tooth_value
                tooth_numbers[gear_number] = tooth_number
                break

            _, result[gear_number] = index_from_stream(gear, 0)
            tooth_numbers[gear_number] = 0
            gear_number -= 1

        else:
            # We changed all the gears, so we are back at the beginning
            finished = True

Upvotes: 1

njzk2

Reputation: 39397

If you want to reimplement it yourself, you can try with recursion. Something as simple as:

def product(cats, prefix = ()):
  if not cats:
    yield prefix
  else:
    head, *tail = cats
    for cat in head:
      yield from product(tail, prefix + (cat,))

is a working start.

The recursion depth is how many lists of categories you have.

Upvotes: 1

questionto42

Reputation: 9532

The following code is a 95% copy from Using NumPy to build an array of all combinations of two arrays; all credits go there! This is said to be much faster since it is only in NumPy.

import numpy as np

def cartesian(arrays, dtype=None, out=None):
    arrays = [np.asarray(x) for x in arrays]
    if dtype is None:
        dtype = arrays[0].dtype
    n = np.prod([x.size for x in arrays])
    if out is None:
        out = np.zeros([n, len(arrays)], dtype=dtype)

    m = int(n / arrays[0].size)
    out[:,0] = np.repeat(arrays[0], m)
    if arrays[1:]:
        cartesian(arrays[1:], out=out[0:m, 1:])
        for j in range(1, arrays[0].size):
            out[j*m:(j+1)*m, 1:] = out[0:m, 1:]
    return out

You need to define the dtype as a parameter if you do not want to take the dtype from the first entry for all entries. Take dtype = 'object' if you have letters and numbers as items. Test:

somelists = [
   [1, 2, 3],
   ['a', 'b'],
   [4, 5]
]

[tuple(x) for x in cartesian(somelists, 'object')]

Out:

[(1, 'a', 4),
 (1, 'a', 5),
 (1, 'b', 4),
 (1, 'b', 5),
 (2, 'a', 4),
 (2, 'a', 5),
 (2, 'b', 4),
 (2, 'b', 5),
 (3, 'a', 4),
 (3, 'a', 5),
 (3, 'b', 4),
 (3, 'b', 5)]

Upvotes: 0

chriskoch

Reputation: 49

You can use itertools.product in the standard library to get the Cartesian product. Other cool, related utilities in itertools include permutations, combinations, and combinations_with_replacement. Here is a link to a Python CodePen for the snippet below:

from itertools import product

somelists = [
   [1, 2, 3],
   ['a', 'b'],
   [4, 5]
]

result = list(product(*somelists))
print(result)

Upvotes: 3

user1035648

Reputation: 521

I would use list comprehension:

somelists = [
   [1, 2, 3],
   ['a', 'b'],
   [4, 5]
]

cart_prod = [(a,b,c) for a in somelists[0] for b in somelists[1] for c in somelists[2]]

Upvotes: 43

Tyler Heers

Reputation: 132

Just to add a bit to what has already been said: if you use SymPy, you can use symbols rather than strings which makes them mathematically useful.

import itertools
import sympy

x, y = sympy.symbols('x y')

somelist = [[x,y], [1,2,3], [4,5]]
somelist2 = [[1,2], [1,2,3], [4,5]]

for element in itertools.product(*somelist):
  print element

About SymPy.

Upvotes: 2

user3850

Reputation:

In Python 2.6 and above, you can use 'itertools.product`. In older versions of Python you can use the following (almost -- see documentation) equivalent code from the documentation, at least as a starting point:

def product(*args, **kwds):
    # product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy
    # product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111
    pools = map(tuple, args) * kwds.get('repeat', 1)
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
    for prod in result:
        yield tuple(prod)

The result of both is an iterator, so if you really need a list for further processing, use list(result).

Upvotes: 12

SilentGhost

Reputation: 319601

With itertools.product:

import itertools
result = list(itertools.product(*somelists))

Upvotes: 36

Jack Taylor

Reputation: 6207

In 99% of cases you should use itertools.product. It is written in efficient C code, so it is probably going to be better than any custom implementation.

In the 1% of cases that you need a Python-only algorithm (for example, if you need to modify it somehow), you can use the code below.

def product(*args, repeat=1):
    """Find the Cartesian product of the arguments.

    The interface is identical to itertools.product.
    """
    # Initialize data structures and handle bad input
    if len(args) == 0:
        yield () # Match behavior of itertools.product
        return
    gears = [tuple(arg) for arg in args] * repeat
    for gear in gears:
        if len(gear) == 0:
            return
    tooth_numbers = [0] * len(gears)
    result = [gear[0] for gear in gears]

    # Rotate through all gears
    last_gear_number = len(gears) - 1
    finished = False
    while not finished:
        yield tuple(result)

        # Get next result
        gear_number = last_gear_number
        while gear_number >= 0:
            gear = gears[gear_number]
            tooth_number = tooth_numbers[gear_number] + 1
            if tooth_number < len(gear):
                # No gear change is necessary, so exit the loop
                result[gear_number] = gear[tooth_number]
                tooth_numbers[gear_number] = tooth_number
                break
            result[gear_number] = gear[0]
            tooth_numbers[gear_number] = 0
            gear_number -= 1
        else:
            # We changed all the gears, so we are back at the beginning
            finished = True

The interface is the same as for itertools.product. For example:

>>> list(product((1, 2), "ab"))
[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]

This algorithm has the following advantages over other Python-only solutions on this page:

It does not build up intermediate results in memory, keeping the memory footprint small.
It uses iteration instead of recursion, meaning you will not get "maximum recursion depth exceeded" errors.
It can accept any number of input iterables, making it more flexible than using nested for loops.

This code is based on the itertools.product algorithm from PyPy, which is released under the MIT licence.

Upvotes: 2

Kenan Banks

Reputation: 211982

Use itertools.product, which has been available since Python 2.6.

import itertools

somelists = [
   [1, 2, 3],
   ['a', 'b'],
   [4, 5]
]
for element in itertools.product(*somelists):
    print(element)

This is the same as:

for element in itertools.product([1, 2, 3], ['a', 'b'], [4, 5]):
    print(element)

Upvotes: 664

Sergio Polimante

Reputation: 189

This can be done as

[(x, y) for x in range(10) for y in range(10)]

another variable? No problem:

[(x, y, z) for x in range(10) for y in range(10) for z in range(10)]

Upvotes: 1

Lucas Schwartz

Reputation: 174

List comprehension is simple and clean:

import itertools

somelists = [
   [1, 2, 3],
   ['a', 'b'],
   [4, 5]
]
lst = [i for i in itertools.product(*somelists)]

Upvotes: 1

Richard Samuelson

Reputation: 17

I believe this works:

def cartesian_product(L):  
   if L:
       return {(a,) + b for a in L[0] 
                        for b in cartesian_product(L[1:])}
   else:
       return {()}

Upvotes: 0

Mike Lu

Reputation: 31

A minor modification to the above recursive generator solution in variadic flavor:

def product_args(*args):
    if args:
        for a in args[0]:
            for prod in product_args(*args[1:]) if args[1:] else ((),):
                yield (a,) + prod

And of course a wrapper which makes it work exactly the same as that solution:

def product2(ar_list):
    """
    >>> list(product(()))
    [()]
    >>> list(product2(()))
    []
    """
    return product_args(*ar_list)

with one trade-off: it checks if recursion should break upon each outer loop, and one gain: no yield upon empty call, e.g.product(()), which I suppose would be semantically more correct (see the doctest).

Regarding list comprehension: the mathematical definition applies to an arbitrary number of arguments, while list comprehension could only deal with a known number of them.

Upvotes: 3

weiyixie

Reputation: 581

Although there are many answers already, I would like to share some of my thoughts:

Iterative approach

def cartesian_iterative(pools):
  result = [[]]
  for pool in pools:
    result = [x+[y] for x in result for y in pool]
  return result

Recursive Approach

def cartesian_recursive(pools):
  if len(pools) > 2:
    pools[0] = product(pools[0], pools[1])
    del pools[1]
    return cartesian_recursive(pools)
  else:
    pools[0] = product(pools[0], pools[1])
    del pools[1]
    return pools
def product(x, y):
  return [xx + [yy] if isinstance(xx, list) else [xx] + [yy] for xx in x for yy in y]

Lambda Approach

def cartesian_reduct(pools):
  return reduce(lambda x,y: product(x,y) , pools)

Upvotes: 10

Anurag Uniyal

Reputation: 88747

Here is a recursive generator, which doesn't store any temporary lists

def product(ar_list):
    if not ar_list:
        yield ()
    else:
        for a in ar_list[0]:
            for prod in product(ar_list[1:]):
                yield (a,)+prod

print list(product([[1,2],[3,4],[5,6]]))

Output:

[(1, 3, 5), (1, 3, 6), (1, 4, 5), (1, 4, 6), (2, 3, 5), (2, 3, 6), (2, 4, 5), (2, 4, 6)]

Upvotes: 19

jfs

Reputation: 414205

For Python 2.5 and older:

>>> [(a, b, c) for a in [1,2,3] for b in ['a','b'] for c in [4,5]]
[(1, 'a', 4), (1, 'a', 5), (1, 'b', 4), (1, 'b', 5), (2, 'a', 4), 
 (2, 'a', 5), (2, 'b', 4), (2, 'b', 5), (3, 'a', 4), (3, 'a', 5), 
 (3, 'b', 4), (3, 'b', 5)]

Here's a recursive version of product() (just an illustration):

def product(*args):
    if not args:
        return iter(((),)) # yield tuple()
    return (items + (item,) 
            for items in product(*args[:-1]) for item in args[-1])

Example:

>>> list(product([1,2,3], ['a','b'], [4,5])) 
[(1, 'a', 4), (1, 'a', 5), (1, 'b', 4), (1, 'b', 5), (2, 'a', 4), 
 (2, 'a', 5), (2, 'b', 4), (2, 'b', 5), (3, 'a', 4), (3, 'a', 5), 
 (3, 'b', 4), (3, 'b', 5)]
>>> list(product([1,2,3]))
[(1,), (2,), (3,)]
>>> list(product([]))
[]
>>> list(product())
[()]

Upvotes: 46

Jason Baker

Reputation: 198577

import itertools
>>> for i in itertools.product([1,2,3],['a','b'],[4,5]):
...         print i
...
(1, 'a', 4)
(1, 'a', 5)
(1, 'b', 4)
(1, 'b', 5)
(2, 'a', 4)
(2, 'a', 5)
(2, 'b', 4)
(2, 'b', 5)
(3, 'a', 4)
(3, 'a', 5)
(3, 'b', 4)
(3, 'b', 5)
>>>

Upvotes: 126

How to get the Cartesian product of multiple lists

Answers (20)

This code here is 6 times faster than itertools.product and uses 10 times less memory.

It uses only very basic operators (programming class in middle school) and that is why it is so efficient.

Here is an explanation how I came up with it

Some background first:

What method did you use to do that?

Let's take a look at a code and the corresponding result of a Cartesian product:

Looking again at the code:

Benchmark

Iterative approach

Recursive Approach

Lambda Approach

Related Questions