UnderscorePoY
UnderscorePoY

Reputation: 21

Why isn't Python itertools.combinations() in O(1) amortized time complexity?

The itertools.combinations function of the CPython's standard library doesn't have an $O(1)$ amortized time complexity. When generating $k$-combinations of $n$ elements, the main issue arises from when $k$ gets close to $n$. We end up with an $O(n)$ amortized time complexity.

  1. Why such a choice from Python core developers ?
  2. Does there exist a code, whether it be a from a standard library or not, that guarantees an O(1) amortized time complexity ?

Quick details of why i ended up with amortized time complexity of $O(n)$ : In that code, the number of combinations ensuring the pivot index is $i$ is $\binom{n-k+i}{i+1}$ and the number of array operations for such an instance is $3k-3i-1$. By multipliying both quantities and summing over all possible values of $i$, we obtain the total number of operations. Dividing this by $\binom{n}{k}$ the number of combinations, we end up with the amortized time complexity of $O(n)$.

The following self-contained C code is a rewrite of the CPython itertools.combinations function made for handling k-combinations of an array of integers between 0 and n-1, in an iterator-type style.

#include <stdlib.h> // malloc
#include <stdio.h> // printf
#include <stdbool.h> //bool, true, false

/* TESTING */
int* integersInOrder(int size){
    int* array = malloc(size * sizeof(int));
    for(int i=0; i < size; i++)
        array[i] = i;
    return array;
}

void printArray(int* array, int size){
    for(int i=0; i<size; i++)
        printf("%d ", (int)array[i]);
    printf("\n");
}

/* itertools.combinations */
static const bool IS_RUNNING = true, IS_DONE = false;

bool nextCombination(int* indices, int n, int k){
    if(k > n)
        return IS_DONE;
    
    bool isFound = false;
    int i;
    // find furthest-right non-maximal index
    for(i = k - 1; i >= 0; i--){
        if(indices[i] != n - k + i){
            isFound = true;
            break;
        }
    }
    if(!isFound)
        return IS_DONE;
    
    // increase index i by 1
    indices[i]++;
    // set all indices to the right of i to be the lexicographical minimum values
    for(int j = i + 1; j < k; j++)
        indices[j] = indices[j - 1] + 1;
   
    return IS_RUNNING;
}

/* ENTRY POINT */
int main()
{
    int n = 5, k = 3;
    int* array = integersInOrder(k);
    printArray(array, k);
    
    while(IS_RUNNING == nextCombination(array, n, k)){
        printArray(array, k);
    }

    return 0;
}

Upvotes: 2

Views: 113

Answers (1)

Jasmijn
Jasmijn

Reputation: 10477

The problem with your solution is that you modify array in place. If CPython were to adapt your implementation, the result of list(itertools.combinations(range(4), 3)) would be something like [[1, 2, 3], [1, 2, 3], [1, 2, 3]], which would be surprising to users. You would need to make a copy of the first k entries in array and yield that, which will take O(k) at the minimum, which, as k approaches n, is O(n).

You can't escape these copies without breaking user's expectations.

Upvotes: 1

Related Questions