Reputation: 704
Consider array a
, holding some of the permutations of 1,2,3,4. (my actual arrays may be larger)
import numpy as np
n = 3
a = np.array([[1, 2, 3, 4],
[1, 2, 4, 3],
[1, 3, 4, 2],
[1, 4, 3, 2],
[2, 3, 4, 1],
[2, 4, 3, 1],
[3, 1, 4, 2],
[4, 1, 2, 3]]))
I want to identify sets of n
rows (in this example, n=3) where each row position holds unique values.
In this example, the output would be:
out = [[0, 4, 7],
[2, 5, 7],
[3, 4, 7]]
The 1st row of out
indicates that a[0], a[4], and a[7] have unique values in each row position
When n
= 2, there are 11 row pairs that match the criteria: [[0,4], [0,6], [0,7], [1,5] ...etc
When n
= 4, there are 0 rows that match the criteria.
I'm new enough to python that I can't find a good way to approach this situation.
Upvotes: 1
Views: 142
Reputation: 50698
Solving this problem efficiently is far from not easy. Indeed, the brute-force solution consisting is using n
nested loop is very inefficient: its complexity is O(c r! / (r-n)!)
where r
is the number of rows of a
and c
is the number of columns of a
(note that !
is the factorial). Since r
is a number of permutation which already grow experientially with the number of unique items in a
, this means the complexity of this solution is really bad.
A more efficient solution (but still not great) is to pick a row, filter the other rows that can match with it (ie. there is no items at the same position that are equal), and then recursively do the same thing n
times (the picked row are only the one that are filtered). The several sets of row indices can be appended in a list during the recursion. It is hard to evaluate the complexity of this solution, but it is far much faster in practice since most rows hopefully does not match together and the filtered rows tends to decrease exponentially too. That being said, the complexity is certainly still exponential since the size of the output appears to grow exponentially too and the output needs to be written.
Here is the implementation:
def recursiveFindSets(a, i, n, availableRows, rowIndices, results):
if availableRows.size == 0:
return
for k in availableRows:
# Save the current choice
rowIndices[i] = k
# The next selected rows needs to be bigger than `k` so to prevent replicates
newAvailableRows = availableRows[availableRows > k]
# If there is no solutions with a[k], then choose another
if newAvailableRows.size == 0:
continue
# Find rows that contains different items of a[i]
goodMatches = np.all(a[newAvailableRows] != a[k], axis=1)
# Find the location relative to `a` and not `a[availableRows]`
newAvailableRows = newAvailableRows[goodMatches]
# If there is no solutions with a[k], then choose another
if newAvailableRows.size == 0:
continue
if i == n-2:
# Generate some solutions from `newAvailableRows`
for k2 in newAvailableRows:
rowIndices[i+1] = k2
results.append(rowIndices.copy())
elif i < n-2:
recursiveFindSets(a, i+1, n, newAvailableRows, rowIndices, results)
def findSets(a, n):
availableRows = np.arange(a.shape[0], dtype=int) # Filter
rowIndices = np.empty(n, dtype=np.int_) # Current set of row indices
results = [] # List of all the sets
recursiveFindSets(a, 0, n, availableRows, rowIndices, results)
if len(results) == 0:
return np.empty((0, n), dtype=int)
return np.vstack(results)
findSets(a, 3)
# Output:
# array([[0, 4, 7],
# [2, 5, 7],
# [3, 4, 7]])
Upvotes: 1
Reputation: 11602
You can reduce this problem to finding all cliques of size n
in an undirected graph. Nodes in the graph are given by row indices of a
. There is an edge between i
and j
if (a[i] != a[j]).all()
.
Here is one implementation based on networkx. A function enumerate_all_cliques(g)
iterates over cliques in g
in order of increasing size. We discard all cliques of size less than n
, keep those of size n
, and stop once the first clique of size greater than n
is found or cliques run out.
from itertools import combinations
import networkx as nx
def f(arr, n):
nodes = np.arange(arr.shape[0])
g = nx.Graph()
g.add_nodes_from(nodes)
for i, j in combinations(nodes, 2):
if (arr[i] != arr[j]).all():
g.add_edge(i, j)
for c in nx.algorithms.clique.enumerate_all_cliques(g):
lc = len(c)
if lc == n:
yield c
elif lc > n:
break
print(list(f(a, 3)))
# [[0, 4, 7], [2, 5, 7], [3, 4, 7]]
Here is another approach: find all maximal cliques and yield all subsets of size n
from each clique. This can lead to double-counting, hence set
is used before the return statement.
def f(arr, n):
nodes = np.arange(arr.shape[0])
g = nx.Graph()
g.add_nodes_from(nodes)
for i, j in combinations(nodes, 2):
if (arr[i] != arr[j]).all():
g.add_edge(i, j)
cliques = set()
# iterate over maximal cliques
for c in nx.algorithms.clique.find_cliques(g):
# update the clique set subsets of c
cliques.update(map(frozenset, combinations(c, n)))
# return all cliques of size n without doublecounting
return [list(c) for c in cliques]
print(f(a, 3))
# [[2, 5, 7], [3, 4, 7], [0, 4, 7]]
The performance of either approach will vary depending on input values.
Upvotes: 1