Xia
Xia

Reputation: 31

Fastest way to generate n combinations in python

I have an array of 10 elements and am looking for the fastest way to randomly generate n combinations of size m in Python with those 10 elements.

I have to do that for all possible sizes of the combinations (m goes from 1 to the number of elements) and n also varies, it increases when the number of combination possible increases.

For example if my input array is: [a, b, c, d, e, f, g, h, i, j]
for n = 4 and m = 3 the output should be:

(e, b, c)
(i, a, d)
(g, j, e)
(i, f, e)

There can't be twice the same element in one permutation.

I am aware of the itertool function that generates all combination of a given size, but I only need n combinations, not all. So I am not sure that using itertools is the fastest solution for my problem (since I have then to select randomly n combinations among all those generated).

Upvotes: 2

Views: 871

Answers (6)

Kim
Kim

Reputation: 1684

The more-itertools library has some useful functions that can index into a set of combinations at high speed.

from more_itertools import combination_index, nth_combination
import random
import pprint

digit_range = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
combination_size = 3
sample_size = 10
domain = digit_range * combination_size
last_combination = tuple(digit_range[-1] * combination_size)
last_index = combination_index(last_combination, domain)
sample_indexes = random.sample(range(0, last_index + 1), sample_size)
sample = [
    nth_combination(domain, combination_size, index)
    for index
    in sample_indexes
]
pprint.pprint(sorted(sample), indent=4)

Results:

python rand_combinations.py 
[   ('b', 'e', 'i'),
    ('b', 'f', 'a'),
    ('b', 'i', 'j'),
    ('c', 'j', 'd'),
    ('d', 'j', 'h'),
    ('f', 'g', 'b'),
    ('g', 'b', 'c'),
    ('g', 'f', 'g'),
    ('h', 'a', 'd'),
    ('i', 'c', 'g')]

The code above selects thousands of random, unique combinations in a second or so.

Method:

  1. construct the last possible combination
  2. get its index using more_itertools.combination_index()
  3. get a sample of unique integer indexes between 0 and the last index using random.sample()
  4. convert the indexes to their corresponding combinations using more_itertools.nth_combination()

Upvotes: 1

selbie
selbie

Reputation: 104569

For a small list, this seems as good as anything else:

def shuffle(items, m=len(items)):
   for i in range(m):
       j = math.floor(random.random() * len(items))
       items[i],items[j] = items[j],items[i]

def makeRandomPermutations(items, n, m):
   clone = items[0:]
   permutations=[]
   for i in range(n):
       shuffle(clone,m)
       permutations.append(clone[0:m])
   return permutations

Example:

>>> items=['a','b','d','e','f','g','h','i','j']
>>> perms = makeRandomPermutations(items,4,3)
>>> for p in perms:
...      print(p)

['b', 'i', 'j']
['a', 'g', 'e']
['e', 'j', 'a']
['a', 'd', 'h']
>>>

Upvotes: 0

Pierre Alex
Pierre Alex

Reputation: 473

Using itertools will be the fastest on every platforms because of its efficient implementation. Combinations are not computed upfront but only when iterating.

Doing this should give you the best results :

from itertools import combinations, islice

#: Generates an iterator for 10 elements among all combinations
#: Note that it is not random at all
input_data = [1, 2, 3, 4, 5]
m_size = 3

all_combinations = combinations(input_data, m_size)
subset = islice(all_combinations, 10)


#: To add randomness
from random import shuffle

shuffle(input_data)

all_combinations = combinations(input_data, m_size)
subset = islice(all_combinations, 10)



Upvotes: 0

Sharim09
Sharim09

Reputation: 6224

Unique Element

For this approach make sure to not pass m(sample size) value greater than the array(population size) size. Or you can add an if statement to check the m value.

import random

def generate_combinations(array, m, n):
    """Generate n random combinations of size m"""
    if m > len(array): # Changing the m value to array length if it is greater than length of array
        m = len(array)
    return [random.sample(array, k=m) for _ in range(n)] # for unique element

array = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
n = 5  # Number of combinations
m = 10

print(generate_combinations(array, m, n))

Sample Output

[[8, 1, 2, 5, 4, 3, 6, 7, 10, 9],
 [9, 6, 7, 1, 8, 2, 10, 3, 4, 5],
 [8, 3, 7, 10, 6, 9, 2, 5, 1, 4],
 [10, 8, 9, 2, 6, 5, 7, 4, 1, 3],
 [5, 4, 1, 7, 10, 6, 3, 9, 8, 2]]

For non-unique element

import random

def generate_combinations(array, m, n):
    """Generate n random combinations of size m using list comprehension"""
    return [random.choices(array, k=m) for _ in range(n)]

array = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
n = 5  # Number of combinations
m = 10

print(generate_combinations(array, m, n))

Sample Output

[[9, 8, 4, 2, 5, 4, 5, 5, 6, 1],
 [2, 6, 10, 3, 6, 1, 7, 6, 2, 4],
 [1, 9, 7, 9, 8, 2, 10, 1, 1, 7],
 [10, 4, 5, 7, 8, 9, 5, 1, 4, 1],
 [9, 3, 3, 5, 9, 4, 9, 8, 10, 6]]

or you can use numpy (Unique)

import numpy as np
def generate_combinations(array, m, n):
    if m> len(array):
        m = len(array)
    return [np.random.choice(array, (1, m), replace=False).tolist()[0] for _ in range(n)]

array = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
n = 5  # Number of combinations
m = 10

print(generate_combinations(array, m, n))

Sample Output

[[6, 7, 9, 10, 8, 4, 1, 2, 3, 5],
 [8, 5, 2, 10, 7, 3, 9, 4, 1, 6],
 [8, 1, 4, 10, 3, 6, 9, 7, 2, 5],
 [7, 4, 8, 5, 3, 9, 6, 10, 1, 2],
 [1, 8, 7, 3, 9, 2, 5, 4, 10, 6]]

Upvotes: 2

mozway
mozway

Reputation: 262204

A comment on the existing answers.

Disclaimer, as originally noted in a comment, my preconception was that codester_09 was the most straightforward and likely the most efficient.

First the incorrect answers:

  • playerJX1: this is only generating the first N combinations of items, which is already available using itertools.combinations.
  • DarkKnight: this doesn't allow to reuse the same item in different combinations, which is not part of the expected logic
  • Pierre Alex: the code is incorrect and won't run (improper use of combinations and shuffle)

Now a timing of the two correct answers codester_09 and selbie.

I generated random input lists, and set m and n to be 1/100th and 1/10th of the length of the input, then computed the total running time and plotted with perfplot.

I had to wrap selbie's makeRandomPermutation function to repeat it n times:

def selbie(array):
    m = len(array)//100
    n = len(array)//10
    return [makeRandomPermutation(array, m) for _ in range(n)]

enter image description here

The winner is the simple: [random.sample(array, k=m) for _ in range(n)]

Upvotes: 1

playerJX1
playerJX1

Reputation: 219

Modifying this solution which originally gives all combinations at time complexity O(N^2). This is the modified version:

cnt=0
def printCombination(arr, size, m,n):
    global cnt
    data = [0]*m
    combinationUtil(arr, data, 0,
                    size - 1, 0, m,n)
 
def combinationUtil(arr, data, start,
                    end, index, m, n):
    global cnt
    if (index == m):
        for j in range(m):
            print(data[j], end = " ");
        print()
        cnt+=1
        return

    i = start;
    while(i <= end and end - i + 1 >= m - index and cnt<n):
        data[index] = arr[i]
        combinationUtil(arr, data, i + 1,
                        end, index + 1, m, n)
        i += 1
arr = [1, 2, 3, 4, 5]
m = 3
size = len(arr)
n= 5
printCombination(arr, size, m, n)

This will print all combinations if needed but isn't completely random (not sure if that's you want or not)

Upvotes: 0

Related Questions