Reputation: 18909
Given a list of sets like:
sets=[{1,2},{2,3},{1,3}]
the product (1,2,3)
will be generated twice in itertools.product(*sets)
, as the literals (1,2,3)
and (2,3,1)
, because there is a loop. If there is no loop there will be no duplication, even though there might be lots of commonality between sets.
A loop is formed to A in a set when you travel to B in the same set and then to B in another set that has A or to B in another set with C which connects to a set with A. e.g. 1>2--2>3--3>1
where '--' indicates movement between sets and '>' indicates movement within the set. The smallest loop would involve a pair of numbers in common between two sets, e.g. a>b--b>a
. {edit: @ravenspoint's notation is nice, I suggest using {a}-b-{a}
instead of the above.} Loops in canonical form should not have a bridging value used more than once: either this represents a case where the loop traced back on itself (like in a cursive "i") or there is a smaller loops that could be made (like the outer and inner squares on the Arch de Triumph](https://commons.wikimedia.org/wiki/File:Paris_Arc_de_Triomphe.jpg).
What type of graph structure could I be using to represent this? I have tried representing each set as a node and then indicating which sets are connected to which, but this is not right since for [{1,2},{1,3},{1,4}]
, there is a connection between all sets -- the common 1-- but there is no loop. I have also tried assigning a letter to each number in each set, but that doesn't seem right, either, since then I don't know how to discriminate against loops within a set.
This was motivated by this question about generating unique products.
Sample sets like the following (which has the trivial loop 4>17--17>4
and longer loops like 13>5--5>11--11>13
)
[{1, 13, 5}, {11, 13}, {17, 11, 4, 5}, {17, 4, 1}]
can be generated as shown in the docstring of core.
Another way to visualize the "path/loop" is to think of coloring points on a grid: columns contain elements of the sets and equal elements are in the same row. A loop is a path that starts at one point and ends at the same point by travelling vertically or horizontally from point to point and must include both directions of motion. A suitable permutation of rows and columns would reveal a staircase polygon.
Upvotes: 4
Views: 202
Reputation: 24518
Unlike the accepted answer, you don't need any external library to solve this problem. Note that in order to find a loop, we need to be able to traverse from one number to all the other numbers in the sets that contain this number. Naturally, we can think of a adjacency list representation where the keys are the individual numbers, and the values are the set indices that contains these numbers.
Example:
Given [{1, 2} ,{2, 3}, {1, 3}]
, the graph would be {1: {0, 2}, 2: {0, 1}, 3: {1, 2}}
.
This graph may be built explicitly, as in the code I show below, or by iterating over the given sets for each number and index pair.
Then we simply run a DFS starting from each number, and check if we can form a loop going from a number in this set to another number in a different set, and back to this number. More formally, we are looking for a loop like u ∈ i -> v ∈ j -> u ∈ k
, where i != j
and j!= k
.
Now, the code:
from collections import defaultdict
def has_loop(edges):
def dfs(u, i):
if colors.get(u, 0) == 2:
return False
if colors.get(u, 0) == 1:
return True
colors[u] = 1
if any(
dfs(v, j)
for j in graph[u]
for v in edges[j]
if i != j and u != v
):
return True
colors[u] = 2
return False
colors = {}
graph = defaultdict(set)
for i, us in enumerate(edges):
for u in us:
graph[u].add(i)
if any(dfs(u, i) for u in list(graph) for i in graph[u]):
return True
return False
Sample run:
for edges in (
[{1, 13, 5}, {11, 13}, {17, 11, 4, 5}, {17, 4, 1}],
[{1, 2}, {1, 3}, {1, 4}],
[{1, 2}, {2, 3},{1, 3}]
):
print(f"edges={edges}, loop={has_loop(edges)}")
Output:
edges=[{1, 5, 13}, {11, 13}, {17, 11, 4, 5}, {17, 4, 1}], loop=True
edges=[{1, 2}, {1, 3}, {1, 4}], loop=False
edges=[{1, 2}, {2, 3}, {1, 3}], loop=True
Upvotes: 1
Reputation: 18909
In my comment to this answer I indicate a way to use edges to try to see if there is a loop. For some sets, however, this can take a long time. For example, the presence of a loop in the following is not detected until nearly 470k products of edges have been tested (about 6 seconds).
D = [{1, 2, 3, 4, 5, 6, 7, 8}, {4, 11, 13, 14, 15, 16, 17}, {11, 20, 21, 23, 24, 28, 29}, {32, 34, 3, 37, 29, 30, 31}, {32, 39, 40, 41, 44, 7, 15, 20}, {6, 39, 47, 48, 16, 52, 53, 24}, {5, 47, 58, 61, 31}, {34, 8, 15, 52, 21, 58, 62, 63}, {37, 41, 14, 78, 23, 63}, {1, 40, 48, 82, 17, 23}]
Alternatively, one can just generate products and keep sorted results and see if a duplicate is ever detected...maybe you will be lucky and find a duplicate right away, but maybe you won't. For example, these sets show a duplicate result after about the 24k product from product(*C)
(where each result is put in a set as tuple(sorted(i))
:
C = [{1, 2, 3, 4, 5, 6}, {9, 12, 13, 14, 15, 16, 17, 18}, {2, 21, 22, 23, 24, 25, 27}, {35, 14, 21, 28, 30, 31}, {1, 35, 37, 38, 43, 12, 44, 25}, {44, 45, 15, 52, 53, 30}, {5, 45, 13, 22, 57, 58}, {37, 6, 18, 23, 59, 62}, {3, 16, 53, 59, 63}, {38, 9, 73, 24, 58, 28}]
I think this situation can be represented as a graph as follows:
For example, for the sets {1,2,4},{2,3},{1,3} the pairs are
[{(1,2),(1,4),(2,4)},{(2,3)},{(1,3)}]
So the vertices are
V = {
(1,2):-1, (1,4):-2, (2,4):-3,
(2,3):-4,
(1,3):-5
}
The edges are
E = [(-1,-4),(-1,-5),(-2,-5),(-3,-4),(-4,-5)]
And now you can analyze this graph however you like.
But it is not necessary to generate all the edges to find out if there is a loop. It might not even be a good idea to generate this graph structure. Consider the case for D.
These are the edges
[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (1, 16), (1, 17), (1, 4), (1, 11), (1, 13), (1, 14), (1, 15), (2, 20), (2, 21), (2, 23), (2, 24), (2, 11), (2, 28), (2, 29), (3, 32), (3, 34), (3, 3), (3, 37), (3, 29), (3, 30), (3, 31), (4, 32), (4, 7), (4, 40), (4, 41), (4, 39), (4, 44), (4, 15), (4, 20), (5, 6), (5, 39), (5, 47), (5, 48), (5, 16), (5, 52), (5, 53), (5, 24), (6, 5), (6, 58), (6, 31), (6, 61), (6, 47), (7, 34), (7, 8), (7, 15), (7, 52), (7, 21), (7, 58), (7, 62), (7, 63), (8, 23), (8, 37), (8, 78), (8, 41), (8, 14), (8, 63), (9, 48), (9, 1), (9, 17), (9, 82), (9, 23), (9, 40)]
If we number them like this
{(0, 1): 0, (0, 2): 1, (0, 3): 2, (0, 4): 3, (0, 5): 4, (0, 6): 5, (0, 7): 6, (0, 8): 7, (1, 4): 8, (1, 11): 9, (1, 13): 10, (1, 14): 11, (1, 15): 12, (1, 16): 13, (1, 17): 14, (2, 11): 15, (2, 20): 16, (2, 21): 17, (2, 23): 18, (2, 24): 19, (2, 28): 20, (2, 29): 21, (3, 3): 22, (3, 29): 23, (3, 30): 24, (3, 31): 25, (3, 32): 26, (3, 34): 27, (3, 37): 28, (4, 7): 29, (4, 15): 30, (4, 20): 31, (4, 32): 32, (4, 39): 33, (4, 40): 34, (4, 41): 35, (4, 44): 36, (5, 6): 37, (5, 16): 38, (5, 24): 39, (5, 39): 40, (5, 47): 41, (5, 48): 42, (5, 52): 43, (5, 53): 44, (6, 5): 45, (6, 31): 46, (6, 47): 47, (6, 58): 48, (6, 61): 49, (7, 8): 50, (7, 15): 51, (7, 21): 52, (7, 34): 53, (7, 52): 54, (7, 58): 55, (7, 62): 56, (7, 63): 57, (8, 14): 58, (8, 23): 59, (8, 37): 60, (8, 41): 61, (8, 63): 62, (8, 78): 63, (9, 1): 64, (9, 17): 65, (9, 23): 66, (9, 40): 67, (9, 48): 68, (9, 82): 69}
We can identify the graph in which we are seeking loops
{0: [1, 2, 3, 4, 5, 6, 7, 64], 1: [0, 2, 3, 4, 5, 6, 7], 2: [0, 1, 3, 4, 5, 6, 7, 22], 3: [0, 1, 2, 4, 5, 6, 7, 8], 4: [0, 1, 2, 3, 5, 6, 7, 45], 5: [0, 1, 2, 3, 4, 6, 7, 37], 6: [0, 1, 2, 3, 4, 5, 7, 29], 7: [0, 1, 2, 3, 4, 5, 6, 50], 13: [14, 8, 9, 10, 11, 12, 38], 14: [13, 8, 9, 10, 11, 12, 65], 8: [13, 14, 9, 10, 11, 12, 3], 9: [13, 14, 8, 10, 11, 12, 15], 10: [13, 14, 8, 9, 11, 12], 11: [13, 14, 8, 9, 10, 12, 58], 12: [13, 14, 8, 9, 10, 11, 30, 51], 16: [17, 18, 19, 15, 20, 21, 31], 17: [16, 18, 19, 15, 20, 21, 52], 18: [16, 17, 19, 15, 20, 21, 59, 66], 19: [16, 17, 18, 15, 20, 21, 39], 15: [16, 17, 18, 19, 20, 21, 9], 20: [16, 17, 18, 19, 15, 21], 21: [16, 17, 18, 19, 15, 20, 23], 26: [27, 22, 28, 23, 24, 25, 32], 27: [26, 22, 28, 23, 24, 25, 53], 22: [26, 27, 28, 23, 24, 25, 2], 28: [26, 27, 22, 23, 24, 25, 60], 23: [26, 27, 22, 28, 24, 25, 21], 24: [26, 27, 22, 28, 23, 25], 25: [26, 27, 22, 28, 23, 24, 46], 32: [29, 34, 35, 33, 36, 30, 31, 26], 29: [32, 34, 35, 33, 36, 30, 31, 6], 34: [32, 29, 35, 33, 36, 30, 31, 67], 35: [32, 29, 34, 33, 36, 30, 31, 61], 33: [32, 29, 34, 35, 36, 30, 31, 40], 36: [32, 29, 34, 35, 33, 30, 31], 30: [32, 29, 34, 35, 33, 36, 31, 12, 51], 31: [32, 29, 34, 35, 33, 36, 30, 16], 37: [40, 41, 42, 38, 43, 44, 39, 5], 40: [37, 41, 42, 38, 43, 44, 39, 33], 41: [37, 40, 42, 38, 43, 44, 39, 47], 42: [37, 40, 41, 38, 43, 44, 39, 68], 38: [37, 40, 41, 42, 43, 44, 39, 13], 43: [37, 40, 41, 42, 38, 44, 39, 54], 44: [37, 40, 41, 42, 38, 43, 39], 39: [37, 40, 41, 42, 38, 43, 44, 19], 45: [48, 46, 49, 47, 4], 48: [45, 46, 49, 47, 55], 46: [45, 48, 49, 47, 25], 49: [45, 48, 46, 47], 47: [45, 48, 46, 49, 41], 53: [50, 51, 54, 52, 55, 56, 57, 27], 50: [53, 51, 54, 52, 55, 56, 57, 7], 51: [53, 50, 54, 52, 55, 56, 57, 12, 30], 54: [53, 50, 51, 52, 55, 56, 57, 43], 52: [53, 50, 51, 54, 55, 56, 57, 17], 55: [53, 50, 51, 54, 52, 56, 57, 48], 56: [53, 50, 51, 54, 52, 55, 57], 57: [53, 50, 51, 54, 52, 55, 56, 62], 59: [60, 63, 61, 58, 62, 18, 66], 60: [59, 63, 61, 58, 62, 28], 63: [59, 60, 61, 58, 62], 61: [59, 60, 63, 58, 62, 35], 58: [59, 60, 63, 61, 62, 11], 62: [59, 60, 63, 61, 58, 57], 68: [64, 65, 69, 66, 67, 42], 64: [68, 65, 69, 66, 67, 0], 65: [68, 64, 69, 66, 67, 14], 69: [68, 64, 65, 66, 67], 66: [68, 64, 65, 69, 67, 18, 59], 67: [68, 64, 65, 69, 66, 34]}
Finding the loops in that seems like it will be difficult and much information about the original problem has been lost. e.g. we are not interested in loops within a set.
If you are just trying to see if there is a loop, it is much easier to do something like this:
The detection of a loop can be accomplished in two steps: pare away unnecessary elements from the sets which cannot be involved in loops and then for each set, see if there is a connection to another set that links back to the home/starting to complete a loop.
from collections import Counter
def core(L):
"""Remove elements and sets that cannot impact the presence of a path.
- Elements that appear only once across all sets are removed (singletons).
- Sets with only one element are removed (dead ends).
Parameters
----------
sets : list of sets
The input list of sets.
Returns
-------
list of sets
The updated list of sets, with irrelevant information removed.
Returns an empty list if no valid sets remain.
Examples
========
>>> from random import randint
>>> d=[{randint(1,20) for i in range(5)} for i in range(4)]; d
[{1, 5, 6, 13, 19}, {2, 7, 11, 12, 13}, {4, 5, 10, 11, 17}, {16, 17, 4, 1}]
>>> core(d)
[{1, 13, 5}, {11, 13}, {17, 11, 4, 5}, {17, 4, 1}]
"""
base = [set(i) for i in L]
while 1:
was = base
if len(base)<2 or all(len(i)<2 for i in base):
return []
m = Counter(e for l in L for e in l)
# remove singletons
base = [i - {e for e in s if m[e] == 1} for s in base]
# remove dead ends
base = [s for s in base if len(s) > 1]
if base == was:
return base
def loops(L):
"""Determine if a path exists between sets in a list of sets.
A path is formed by starting at one set, traveling to another set that shares
at least one element, and continuing until either:
- A set is reached that links back to the starting set (a loop is formed).
- No further connections are possible (dead end).
Parameters
----------
sets : list of sets
The input list of sets.
Returns
-------
list
a list of (i, e) tuples showing the bridge values
at each step of the path where i is the set index in the list
and e is the value in that list bridging to the next indicated
set.
Examples
========
In the following, there is a loop 2>1--1>3--3>2
where ">" indicates a connection within a set and
"--" indicates a bridge between sets.
>>> loops([(1,2),(2,3),(1,3)])
[(0, 2), (2, 1), (1, 3)]
There is a longer loop here:
>>> d = [{2, 4, 5, 6, 7, 8, 9}, {9, 12, 14, 15, 18, 19}, {14, 21, 22, 24, 28}, {32, 34, 36, 37, 6, 19, 30, 31},
... {37, 38, 39, 40, 43, 8, 44, 22}, {40, 50, 51, 52, 24, 31}, {32, 5, 38, 12, 21, 53, 54, 55},
... {36, 39, 7, 55, 57, 63}, {65, 34, 66, 44, 15, 51, 54, 57}, {65, 2, 76, 52}]
>>> loops(d)
[(0, 7), (9, 2), (8, 65), (7, 57)]
"""
L = [set(i) for i in L]
n = len(L)
for home in range(n - 1):
# this is going to change the reference on sets
# but now that we are moving to a new home row,
# it is advantageous to update the core so we
# are not working with elements that are no longer
# connected; alternatively, any element that appeared
# only in home and one other place could be removed.
L = core(L[home:])
if len(L) < 1:
return False
indices = range(home, n)
vis = [0]
hit = [[i for i in indices if i not in vis and L[0] & L[i]]]
while hit:
j = hit[-1].pop()
if not hit[-1]:
hit.pop()
vis.append(j)
home_hit = L[0] & L[j]
if home_hit:
# there is a loop for {1,2},{1,2,3} between 1 and 2
# in this case the home_hit would be at least of length 2
# there is a longer loop for {1,2},{2,3},{1,3}
# in this case the home_hit would be at least of length 1
if (len(home_hit) > 1 or len(vis) > 2): # a loop to home
V = [home+i for i in vis] # true path indices
if len(vis) > 2:
EV = V[-1:] + V # wrap end around to front
path = [min(L[i] & L[j]) for i,j in zip(EV, V)]
else:
path = sorted(home_hit)
how = list(zip(V, path))
return how
h = [i for i in indices if i not in vis and L[j] & L[i]]
if h:
hit.append(h)
else:
vis.pop()
def renum(L):
"""return n and sets with elements expressed as integers [1,n]
Examples
========
>>> d=[{1, 2, 3, 4, 5, 6, 7, 8}, {4, 11, 13, 14, 15, 16, 17}]
>>> renum(d)
(14, [{1, 2, 3, 4, 5, 6, 7, 8}, {4, 9, 10, 11, 12, 13, 14}])
"""
g = defaultdict(int)
for l in L:
for e in l:
if e not in g:
g[e] = len(g)+1
return len(g), [{g[i] for i in l} for l in L]
def rref_loop(L):
"""hypothesis: if the rref form of the adjaceny matrix has
nonzero entries in the last row, then there is a loop. Otherwise
there *might* be a loop.
"""
from sympy import Matrix
c, l = renum(L)
m = Matrix.zeros(len(l), c+1)
r = 0
num = {}
for i in l:
for e in i:
m[r, e] = 1
r+=1
if any(m.rref()[0][-1,:]):
return True
Both loops
(and rref_loops
in this case) indicate that loops are present almost instantly.
If that idea of loops
is developed it can be used to find 24 loops in D in about 0.5 milliseconds.
Upvotes: 1
Reputation: 20410
To form a loop you must travel from one number ( A ) in a set to another number ( B ) in the set, then to the same number ( B ) in another set
So, we cannot just connect pairs of sets that share one or more numbers. We must connect a pair of sets that share one or more numbers with one or more edges that are labelled with the number that is used to connect them. Now we insist that when we travel through a node, we cannot use an out edge with the same label as the in edge that was used.
Here is the code to implement this graph generator https://codeberg.org/JamesBremner/so79339486/src/commit/f53b941cd6561f01c81af1194ff5922289174157/src/main.cpp#L32-L50
void genGraph()
{
raven::set::cRunWatch aWatcher("genGraph");
theGD.g.clear();
for (int is1 = 0; is1 < theSets.size(); is1++)
{
for (int is2 = is1 + 1; is2 < theSets.size(); is2++)
{
for (int n1 : theSets[is1])
for (int n2 : theSets[is2])
if (n1 == n2)
theGD.setEdgeWeight(
theGD.g.add(
"set" + std::to_string(is1),
"set" + std::to_string(is2)),
n1);
}
}
}
for the sample D = [{1, 2, 3, 4, 5, 6, 7, 8}, ...
the graph looks like:
The graph generation is done in less than a millisecond
raven::set::cRunWatch code timing profile
Calls Mean (secs) Total Scope
1 0.000856 0.000856 genGraph
The algorithm to find cycles ( == loops ) in this problem is a modified depth first search. Two modifications are required.
The BFS cannot travel along two successive edges with the same label
When a previously visited vertex is reached, the Dijsktra algorithm is applied to find the path that forms the shortest cycle that leads back to the back edge of the previously visited vertex.
Implementing this sort of thing in Python is not recommended - the performance is painfully slow.
Here is some C++ code to implement the modified DFS to prevent using edges with the same label twice in a row.
https://codeberg.org/JamesBremner/so79339486/src/branch/main/src/cycleFinder.cpp
The output, when run on example D, is
24 loops found
set0 -1- set9 -0- set2 -0- set7 -8- set0
set0 -1- set9 -0- set2 -0- set3 -31- set6 -5- set0
set0 -1- set9 -0- set2 -0- set5 -6- set0
set7 -0- set2 -0- set5 -52- set7
set9 -0- set2 -0- set5 -48- set9
set0 -1- set9 -0- set2 -0- set4 -7- set0
set7 -0- set2 -0- set4 -15- set7
set8 -0- set9 -0- set2 -0- set4 -41- set8
set9 -0- set2 -0- set4 -40- set9
set0 -1- set9 -0- set2 -0- set3 -3- set0
set7 -0- set2 -0- set3 -34- set7
set8 -0- set9 -0- set2 -0- set3 -37- set8
set8 -0- set9 -0- set2 -23- set8
set0 -1- set9 -0- set2 -11- set1 -4- set0
set4 -0- set2 -0- set9 -1- set0 -4- set1 -15- set4
set5 -0- set2 -0- set9 -1- set0 -4- set1 -16- set5
set7 -0- set2 -0- set9 -1- set0 -4- set1 -15- set7
set8 -0- set9 -1- set0 -4- set1 -14- set8
set9 -1- set0 -4- set1 -17- set9
set4 -0- set2 -0- set3 -32- set4
set4 -0- set2 -0- set5 -39- set4
set6 -5- set0 -1- set9 -0- set2 -0- set5 -47- set6
set6 -5- set0 -1- set9 -0- set2 -0- set7 -58- set6
set8 -0- set9 -0- set2 -0- set7 -63- set8
raven::set::cRunWatch code timing profile
Calls Mean (secs) Total Scope
1 0.0088616 0.0088616 cycleFinder
1 0.0009284 0.0009284 genGraph
-N- means that the loop has taken N as the common number to travel between sets.
The cycle finder takes 10 milliseconds on this problem. That seems plenty fast enough, but a simple optimization might be well worthwhile if you have problems larger than this. I believe that the presence of one loop means that the number sets fail. If so, then we could abort the loop finding as soon as one loop is found.
Another small test problem {1, 2, 4}, {2, 3}, {1, 3, 5}, {4, 5}
2 loops found
set0 -2- set1 -3- set2 -1- set0
set3 -4- set0 -1- set2 -5- set3
raven::set::cRunWatch code timing profile
Calls Mean (secs) Total Scope
1 0.0007017 0.0007017 cycleFinder
1 0.0003599 0.0003599 genGraph
Upvotes: 2
Reputation: 367
from collections import defaultdict
def solve(sets):
def dfs(node, vis, parent):
vis.add(node)
for nei in G[node]:
if nei not in vis:
if dfs(nei, vis, node):
return True
elif nei != parent:
return True
return False
G = defaultdict(set)
for s in sets:
for num in s:
G[num].update(s - {num})
vis = set()
for node in G:
if node not in vis:
if dfs(node, vis, None):
return True
return False
print(solve([{1, 2}, {2, 3}, {1, 3}]))
print(solve([{1, 2}, {1, 3}, {1, 4}]))
True
False
import networkx as nx
import matplotlib.pyplot as plt
def draw_graph(sets):
G = nx.Graph()
for s in sets:
for num in s:
for other in s - {num}:
G.add_edge(num, other)
pos = nx.spring_layout(G)
nx.draw(G, pos, with_labels=True, node_color='gray', edge_color='blue', node_size=2000, font_size=12)
plt.show()
draw_graph([{1, 2}, {2, 3}, {1, 3}])
draw_graph([{1, 2}, {1, 3}, {1, 4}])
Upvotes: 1