fulmicoton
fulmicoton

Reputation: 15968

Python's itertools product memory consumption

The documentation says that the cartesian product function

the actual implementation does not build up intermediate results in memory.

How can that be possible with generators? Can somebody show me an example with a bounded memory consumption for 2 generators?

Upvotes: 15

Views: 2109

Answers (2)

NPE
NPE

Reputation: 500427

Looking at the module's source code, itertools.product() actually converts every argument to a tuple:

// product_new() in itertoolsmodule.c
for (i=0; i < nargs ; ++i) {
    PyObject *item = PyTuple_GET_ITEM(args, i);
    PyObject *pool = PySequence_Tuple(item); //<==== Call tuple(arg)
    if (pool == NULL)
        goto error;
    PyTuple_SET_ITEM(pools, i, pool);
    indices[i] = 0;
}

In other words, itertools.product()'s memory consumption appears to be linear in the size of the input arguments.

Upvotes: 12

Eli Bendersky
Eli Bendersky

Reputation: 273536

Well, it also says:

The nested loops cycle like an odometer with the rightmost element advancing on every iteration. This pattern creates a lexicographic ordering so that if the input’s iterables are sorted, the product tuples are emitted in sorted order.

This is pretty much how it works in the implementation (Modules/itertoolsmodule.c)

Here is the state object:

typedef struct {
    PyObject_HEAD
    PyObject *pools;       /* tuple of pool tuples */
    Py_ssize_t *indices;   /* one index per pool */
    PyObject *result;      /* most recently returned result tuple */
    int stopped;           /* set to 1 when the product iterator is exhausted */
} productobject;

And the next item is returned by the function product_next, which uses this state and the algorithm described in the quote to generate the next state. See this answer to understand the memory requirements.

For general education, you can read about how to create generators with state from C extensions here.

Upvotes: 4

Related Questions