Reputation: 48416
How can I get the Cartesian product (every possible combination of values) from a group of lists?
For example, given
somelists = [
[1, 2, 3],
['a', 'b'],
[4, 5]
]
How do I get this?
[(1, 'a', 4), (1, 'a', 5), (1, 'b', 4), (1, 'b', 5), (2, 'a', 4), (2, 'a', 5), ...]
One common application for this technique is to avoid deeply nested loops. See Avoiding nested for loops for a more specific duplicate. Similarly, this technique might be used to "explode" a dictionary with list values; see Combine Python Dictionary Permutations into List of Dictionaries .
If you want a Cartesian product of the same list with itself multiple times, itertools.product
can handle that elegantly. See Operation on every pair of element in a list or How can I get "permutations with repetitions" from a list (Cartesian product of a list with itself)?.
Many people who already know about itertools.product
struggle with the fact that it expects separate arguments for each input sequence, rather than e.g. a list of lists. The accepted answer shows how to handle this with *
. However, the use of *
here to unpack arguments is fundamentally not different from any other time it's used in a function call. Please see Expanding tuples into arguments for this topic (and use that instead to close duplicate questions, as appropriate).
Upvotes: 543
Views: 409454
Reputation: 3310
Recursive Approach:
def rec_cart(start, array, partial, results):
if len(partial) == len(array):
results.append(partial)
return
for element in array[start]:
rec_cart(start+1, array, partial+[element], results)
rec_res = []
some_lists = [[1, 2, 3], ['a', 'b'], [4, 5]]
rec_cart(0, some_lists, [], rec_res)
print(rec_res)
Iterative Approach:
def itr_cart(array):
results = [[]]
for i in range(len(array)):
temp_res = []
for res in results:
for element in array[i]:
temp_res.append(res+[element])
results = temp_res
return results
some_lists = [[1, 2, 3], ['a', 'b'], [4, 5]]
itr_res = itr_cart(some_lists)
print(itr_res)
Upvotes: 10
Reputation: 520
Probably you learned binary numbers in school/uni and did a lot of exercises to get more comfortable in understanding this system. It is strange leading with a limited set of numbers, right? That means you have already created a lot of Cartesian products and have some good practice.
Maybe, your teacher told you to use mainly these 3 operators: / and % and * that was all what you needed (at least in my case, it was).
0 0000
1 0001
2 0010
3 0011
4 0100
5 0101
This code: product(range(0,5),range(0,7),range(0,8)) generates that:
(0, 0, 0),
(0, 0, 1),
(0, 0, 2),
(0, 0, 3),
(0, 0, 4),
(0, 0, 5),
(0, 0, 6),
(0, 0, 7),
(0, 1, 0),
(0, 1, 1),
(0, 1, 2),
(0, 1, 3),
(0, 1, 4),
(0, 1, 5),
(0, 1, 6),
(0, 1, 7),
(0, 2, 0),
(0, 2, 1),
(0, 2, 2),
(0, 2, 3),
(0, 2, 4),
(0, 2, 5),
(0, 2, 6),
(0, 2, 7),
(0, 3, 0),
..
It looks almost the same, right? Almost like the binary table.
When I noticed that, I got a feeling that there were great chances to write an algorithm using mainly / and % and * - no recursion, no concatenation of lists, no generators - nothing, just 7th grade computer basics ...
And it was surprisingly easy.
The only difference there is compared to the binary data is that the base changes often [different length of input data] in every column. When the length of the first input iterator is reached [ range(0,8) ], the second column starts all of a sudden, but not with 8 like in our decimal system. It is 10 after 7, because the base of the first column is 8 [ range(0,8) ], . And there is nothing left after 7
(0, 0, 0),
(0, 0, 1),
(0, 0, 2),
(0, 0, 3),
(0, 0, 4),
(0, 0, 5),
(0, 0, 6),
(0, 0, 7),
(0, 1, 0),
(0, 1, 1),
Now it becomes very clear what the bases of the other columns are. (If it's not that clear, use len() )
(range(0,5),range(0,7),range(0,8))
And this is all what you need to get it going:
from cython.parallel cimport prange
cimport cython
import numpy as np
cimport numpy as np
import cython
ctypedef fused real:
cython.char
.... MORE DATA TYPES
cython.longdoublecomplex
cython.Py_hash_t
cython.Py_UCS4
cpdef void create_product(cython.Py_ssize_t liste, cython.Py_ssize_t[:] indlist, real[:] ctypesframedataresults, real[:] ear):
cdef cython.Py_ssize_t geht,bleibt,zahlx,q,geht1,zahl
cdef cython.Py_ssize_t indlistlen =len(indlist)
for q in prange(liste,nogil=True):
geht = q
bleibt = 0
for zahlx in range(indlistlen):
zahl=indlist[zahlx]
geht1 = geht // zahl # 8th grade - computer science class - the heart of the algorithm :)
bleibt = geht % zahl
geht = geht1
ctypesframedataresults[q*indlistlen+zahlx] = ear[bleibt] # this was a little tricky,
And here is the Python code (more advanced than the algorithm :-) ):
import functools
import operator
from collections import defaultdict
from cythonproduct import create_product #import the compiled file
import numpy as np
def get_pointer_array(original):
dty = np.ctypeslib.as_ctypes_type(original.dtype)
b = original.ctypes.data
buff = (dty * original.size).from_address(b)
aflat = np.frombuffer(buff, dtype=original.dtype)
return aflat
def cartesian_product(*args):
lookup1 = {}
lookup2 = defaultdict(list)
i = 0
# Create dictionaries for indexing and lookup during computation
for arg in (args):
for f in arg:
lookup2[f].append(i)
lookup1[i] = f
i += 1
indlist = np.array([len(x) for x in args],dtype=np.int64) # get the bases
listexx = int(np.product(indlist)) # shape of output array (1 column)
dty = np.array(list(lookup2.keys()))
rightshape = np.array(functools.reduce(operator.add, list(lookup2.values())))
bestdtype = (dtypecheck(dty, filterna=False, float2int=False, )) # optional - you can use your own
# this part is used to create a numpy array that serves as a lookup dict (without hashing anything) - to that we can use all kind of values, not only 0,1,2,3,4,5...
ear = np.zeros(rightshape.shape, dtype=bestdtype.dtype)
for k, item in lookup1.items(): # with numpy arrays, we can disable the GIL.
ear[k] = item # (Only a few elements - for is ok)
dataresults = np.zeros((listexx, len(indlist)), dtype=ear.dtype)
if not dataresults.flags['C_CONTIGUOUS']: # may not be necessary, but who knows …
dataresults = np.ascontiguousarray(dataresults)
# this is how you get a pointer. I don't know if there are better/safer ways (Strides maybe?)
# Ctypes works, that is what counts. We can modify the flat array in Cython and see the results in the
# ndim array. That way, we can pass anything to Cython. And we can loop in Cython like
# we would in Python (almost) using .... guess what: the Cartesian product. You can generate
# the product on the Python side, pass the index array (always 2D) and use it as a loop index on the
# Cython site, and no function signature will ever bother you.
ctypesframedataresults = get_pointer_array(dataresults)
# Let Cython do the work here and modify the memory directly. There is probably nothing in this world
# the uses more memory than a Cartesian product :)
create_product(listexx, indlist, ctypesframedataresults, ear)
return dataresults
I posted an auto-compiling version on GitHub, in case that you want to use it, but don't know how to configure Cython: https://github.com/hansalemaos/cythoncartesian/blob/main/__init__.py
Benchmark:
args2=[list(range(8)),list(range(8)),list(range(8)),list(range(8)),
list(range(8)),list(range(8)),list(range(8)),list(range(8)),
list(range(8))]
dataresults=cartesian_product(*args2)
# Mem usage 2 GB
# Out[4]:
# array([[0, 0, 0, ..., 0, 0, 0],
# [1, 0, 0, ..., 0, 0, 0],
# [2, 0, 0, ..., 0, 0, 0],
# ...,
# [5, 7, 7, ..., 7, 7, 7],
# [6, 7, 7, ..., 7, 7, 7],
# [7, 7, 7, ..., 7, 7, 7]], dtype=uint8)
# %timeit dataresults=cartesian_product(*args2)
# 2.65 s ± 163 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# dataresults.shape
# Out[6]: (134217728, 9)
```
## Itertools:
# itertools.product
# Mem usage 16 GB
# import itertools
# %timeit (list(itertools.product(*args2)))
# 11.5 s ± 203 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
By the way: it is also extremely fast without Cython, but I haven't done a benchmark yet.
Upvotes: -3
Reputation: 877
I like jack taylor 's implementation above because it is the only one so far that can be easily modified to stream the answers without materializing iterators passed in. So, if you are doing the product of a bunch of really big (or infinite, or expensive) iterators and you might stop before the end, you only have to materialize as much as you need.
Here's one approach to modifying it for that:
def product_stream(*args, repeat=1):
"""Find the Cartesian product of the arguments.
The interface is identical to itertools.product.
"""
def index_from_stream(array_stream, index):
try:
while index >= len(array_stream[0]):
next_element = next(array_stream[1])
array_stream[0].append(next_element)
return True, array_stream[0][index]
except StopIteration:
return False, None
# Initialize data structures and handle bad input
if len(args) == 0:
# Match behavior of itertools.product
yield ()
return
gears = [([], arg) for arg in args] * repeat
for gear in gears:
if not index_from_stream(gear, 0)[0]:
return
tooth_numbers = [0] * len(gears)
result = [index_from_stream(gear, 0)[1] for gear in gears]
# Rotate through all gears
last_gear_number = len(gears) - 1
finished = False
while not finished:
yield tuple(result)
# Get next result
gear_number = last_gear_number
while gear_number >= 0:
gear = gears[gear_number]
tooth_number = tooth_numbers[gear_number] + 1
has_tooth, gear_tooth_value = index_from_stream(gear, tooth_number)
if has_tooth:
# No gear change is necessary, so exit the loop
result[gear_number] = gear_tooth_value
tooth_numbers[gear_number] = tooth_number
break
_, result[gear_number] = index_from_stream(gear, 0)
tooth_numbers[gear_number] = 0
gear_number -= 1
else:
# We changed all the gears, so we are back at the beginning
finished = True
Upvotes: 1
Reputation: 39397
If you want to reimplement it yourself, you can try with recursion. Something as simple as:
def product(cats, prefix = ()):
if not cats:
yield prefix
else:
head, *tail = cats
for cat in head:
yield from product(tail, prefix + (cat,))
is a working start.
The recursion depth is how many lists of categories you have.
Upvotes: 1
Reputation: 9532
The following code is a 95% copy from Using NumPy to build an array of all combinations of two arrays; all credits go there! This is said to be much faster since it is only in NumPy.
import numpy as np
def cartesian(arrays, dtype=None, out=None):
arrays = [np.asarray(x) for x in arrays]
if dtype is None:
dtype = arrays[0].dtype
n = np.prod([x.size for x in arrays])
if out is None:
out = np.zeros([n, len(arrays)], dtype=dtype)
m = int(n / arrays[0].size)
out[:,0] = np.repeat(arrays[0], m)
if arrays[1:]:
cartesian(arrays[1:], out=out[0:m, 1:])
for j in range(1, arrays[0].size):
out[j*m:(j+1)*m, 1:] = out[0:m, 1:]
return out
You need to define the dtype as a parameter if you do not want to take the dtype from the first entry for all entries. Take dtype = 'object' if you have letters and numbers as items. Test:
somelists = [
[1, 2, 3],
['a', 'b'],
[4, 5]
]
[tuple(x) for x in cartesian(somelists, 'object')]
Out:
[(1, 'a', 4),
(1, 'a', 5),
(1, 'b', 4),
(1, 'b', 5),
(2, 'a', 4),
(2, 'a', 5),
(2, 'b', 4),
(2, 'b', 5),
(3, 'a', 4),
(3, 'a', 5),
(3, 'b', 4),
(3, 'b', 5)]
Upvotes: 0
Reputation: 49
You can use itertools.product
in the standard library to get the Cartesian product. Other cool, related utilities in itertools
include permutations
, combinations
, and combinations_with_replacement
. Here is a link to a Python CodePen for the snippet below:
from itertools import product
somelists = [
[1, 2, 3],
['a', 'b'],
[4, 5]
]
result = list(product(*somelists))
print(result)
Upvotes: 3
Reputation: 521
I would use list comprehension:
somelists = [
[1, 2, 3],
['a', 'b'],
[4, 5]
]
cart_prod = [(a,b,c) for a in somelists[0] for b in somelists[1] for c in somelists[2]]
Upvotes: 43
Reputation: 132
Just to add a bit to what has already been said: if you use SymPy, you can use symbols rather than strings which makes them mathematically useful.
import itertools
import sympy
x, y = sympy.symbols('x y')
somelist = [[x,y], [1,2,3], [4,5]]
somelist2 = [[1,2], [1,2,3], [4,5]]
for element in itertools.product(*somelist):
print element
Upvotes: 2
Reputation:
In Python 2.6 and above, you can use 'itertools.product`. In older versions of Python you can use the following (almost -- see documentation) equivalent code from the documentation, at least as a starting point:
def product(*args, **kwds):
# product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy
# product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111
pools = map(tuple, args) * kwds.get('repeat', 1)
result = [[]]
for pool in pools:
result = [x+[y] for x in result for y in pool]
for prod in result:
yield tuple(prod)
The result of both is an iterator, so if you really need a list for further processing, use list(result)
.
Upvotes: 12
Reputation: 319601
With itertools.product:
import itertools
result = list(itertools.product(*somelists))
Upvotes: 36
Reputation: 6207
In 99% of cases you should use itertools.product. It is written in efficient C code, so it is probably going to be better than any custom implementation.
In the 1% of cases that you need a Python-only algorithm (for example, if you need to modify it somehow), you can use the code below.
def product(*args, repeat=1):
"""Find the Cartesian product of the arguments.
The interface is identical to itertools.product.
"""
# Initialize data structures and handle bad input
if len(args) == 0:
yield () # Match behavior of itertools.product
return
gears = [tuple(arg) for arg in args] * repeat
for gear in gears:
if len(gear) == 0:
return
tooth_numbers = [0] * len(gears)
result = [gear[0] for gear in gears]
# Rotate through all gears
last_gear_number = len(gears) - 1
finished = False
while not finished:
yield tuple(result)
# Get next result
gear_number = last_gear_number
while gear_number >= 0:
gear = gears[gear_number]
tooth_number = tooth_numbers[gear_number] + 1
if tooth_number < len(gear):
# No gear change is necessary, so exit the loop
result[gear_number] = gear[tooth_number]
tooth_numbers[gear_number] = tooth_number
break
result[gear_number] = gear[0]
tooth_numbers[gear_number] = 0
gear_number -= 1
else:
# We changed all the gears, so we are back at the beginning
finished = True
The interface is the same as for itertools.product. For example:
>>> list(product((1, 2), "ab"))
[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]
This algorithm has the following advantages over other Python-only solutions on this page:
This code is based on the itertools.product algorithm from PyPy, which is released under the MIT licence.
Upvotes: 2
Reputation: 211982
Use itertools.product
, which has been available since Python 2.6.
import itertools
somelists = [
[1, 2, 3],
['a', 'b'],
[4, 5]
]
for element in itertools.product(*somelists):
print(element)
This is the same as:
for element in itertools.product([1, 2, 3], ['a', 'b'], [4, 5]):
print(element)
Upvotes: 664
Reputation: 189
This can be done as
[(x, y) for x in range(10) for y in range(10)]
another variable? No problem:
[(x, y, z) for x in range(10) for y in range(10) for z in range(10)]
Upvotes: 1
Reputation: 174
List comprehension is simple and clean:
import itertools
somelists = [
[1, 2, 3],
['a', 'b'],
[4, 5]
]
lst = [i for i in itertools.product(*somelists)]
Upvotes: 1
Reputation: 17
I believe this works:
def cartesian_product(L):
if L:
return {(a,) + b for a in L[0]
for b in cartesian_product(L[1:])}
else:
return {()}
Upvotes: 0
Reputation: 31
A minor modification to the above recursive generator solution in variadic flavor:
def product_args(*args):
if args:
for a in args[0]:
for prod in product_args(*args[1:]) if args[1:] else ((),):
yield (a,) + prod
And of course a wrapper which makes it work exactly the same as that solution:
def product2(ar_list):
"""
>>> list(product(()))
[()]
>>> list(product2(()))
[]
"""
return product_args(*ar_list)
with one trade-off: it checks if recursion should break upon each outer loop, and one gain: no yield upon empty call, e.g.product(())
, which I suppose would be semantically more correct (see the doctest).
Regarding list comprehension: the mathematical definition applies to an arbitrary number of arguments, while list comprehension could only deal with a known number of them.
Upvotes: 3
Reputation: 581
Although there are many answers already, I would like to share some of my thoughts:
def cartesian_iterative(pools):
result = [[]]
for pool in pools:
result = [x+[y] for x in result for y in pool]
return result
def cartesian_recursive(pools):
if len(pools) > 2:
pools[0] = product(pools[0], pools[1])
del pools[1]
return cartesian_recursive(pools)
else:
pools[0] = product(pools[0], pools[1])
del pools[1]
return pools
def product(x, y):
return [xx + [yy] if isinstance(xx, list) else [xx] + [yy] for xx in x for yy in y]
def cartesian_reduct(pools):
return reduce(lambda x,y: product(x,y) , pools)
Upvotes: 10
Reputation: 88747
Here is a recursive generator, which doesn't store any temporary lists
def product(ar_list):
if not ar_list:
yield ()
else:
for a in ar_list[0]:
for prod in product(ar_list[1:]):
yield (a,)+prod
print list(product([[1,2],[3,4],[5,6]]))
Output:
[(1, 3, 5), (1, 3, 6), (1, 4, 5), (1, 4, 6), (2, 3, 5), (2, 3, 6), (2, 4, 5), (2, 4, 6)]
Upvotes: 19
Reputation: 414205
For Python 2.5 and older:
>>> [(a, b, c) for a in [1,2,3] for b in ['a','b'] for c in [4,5]]
[(1, 'a', 4), (1, 'a', 5), (1, 'b', 4), (1, 'b', 5), (2, 'a', 4),
(2, 'a', 5), (2, 'b', 4), (2, 'b', 5), (3, 'a', 4), (3, 'a', 5),
(3, 'b', 4), (3, 'b', 5)]
Here's a recursive version of product()
(just an illustration):
def product(*args):
if not args:
return iter(((),)) # yield tuple()
return (items + (item,)
for items in product(*args[:-1]) for item in args[-1])
Example:
>>> list(product([1,2,3], ['a','b'], [4,5]))
[(1, 'a', 4), (1, 'a', 5), (1, 'b', 4), (1, 'b', 5), (2, 'a', 4),
(2, 'a', 5), (2, 'b', 4), (2, 'b', 5), (3, 'a', 4), (3, 'a', 5),
(3, 'b', 4), (3, 'b', 5)]
>>> list(product([1,2,3]))
[(1,), (2,), (3,)]
>>> list(product([]))
[]
>>> list(product())
[()]
Upvotes: 46
Reputation: 198577
import itertools
>>> for i in itertools.product([1,2,3],['a','b'],[4,5]):
... print i
...
(1, 'a', 4)
(1, 'a', 5)
(1, 'b', 4)
(1, 'b', 5)
(2, 'a', 4)
(2, 'a', 5)
(2, 'b', 4)
(2, 'b', 5)
(3, 'a', 4)
(3, 'a', 5)
(3, 'b', 4)
(3, 'b', 5)
>>>
Upvotes: 126