Reputation: 1811
I'm using NetworkX to generate a bipartite graph using either nx.bipartite.random_graph
or nx.bipartite.gnmk_random_graph
, as follows:
B = bipartite.gnmk_random_graph(5,6,10)
bottom_nodes, top_nodes = bipartite.sets(B)
However, I get an error:
networkx.exception.AmbiguousSolution: Disconnected graph: Ambiguous solution for bipartite sets.
It is just a single line, so I'm not sure how I could be doing this wrong and why their package would be returning (what I assume is) an invalid bipartite graph.
Thanks.
EDIT: I just realised that I need to specify a minimum number of edges/probability for the third argument.
E.g. bipartite.random_graph(5,6,0.6)
and having p>0.5
gets rid of the error. Similarly, bipartite.gnmk_random_graph(5,6,11)
where k>n+m
. I didn't realise this was the case, as I assumed if the number of edges was lower than required to connect every vertex there would just be some floating vertices.
Thanks for your help!
Upvotes: 5
Views: 4133
Reputation: 10912
Considering that you have a {5, 6} bipartite graph with only 10 edges, its very likely that you graph will be disconnected (it is so sparse that you even have a high probability of having isolated nodes).
import networkx as nx
import random
random.seed(0)
B = nx.bipartite.gnmk_random_graph(5,6,10)
isolated_nodes = set(B.nodes())
for (u, v) in B.edges():
isolated_nodes -= {u}
isolated_nodes -= {v}
print(isolated_nodes)
Will show you that node with id=1 is isolated. What you can do to make your graph connected is to only keep its largest connected component:
import networkx as nx
import random
random.seed(0)
B = nx.bipartite.gnmk_random_graph(5,6,11)
components = sorted(nx.connected_components(B), key=len, reverse=True)
largest_component = components[0]
C = B.subgraph(largest_component)
Which will here only remove node 1 (an isolated node).
Now the only question is "how random this new graph is". In other words, does it pick any graph in the set of random connected bipartite graphs with 5-6 nodes and 10 edges with equal probability. For now I'm not sure, but it looks decent I think.
Of course what you suggest (picking a graph until its connected) will be ok, but it can be costly (depending on the 3 parameters of course).
Edit I'm dumb, this can't be ok as the new graph doesn't have the right number of nodes/edges. But there should be a better solution than just retry until you get a good graph. Hmm that's interesting ...
2nd edit Maybe this answer could help in finding a good solution to this problem.
3rd edit and a suggestion
As you have noticed in the question I linked, the accepted answer is not really correct as the generated graph is not selected uniformly at random in the set of expected graphs. We can do something a bit similar here to have a first decent solution. The idea is to first create a connected bipartite graph with the minimum number of edges by iteratively picking isolated nodes and connected them to the other side of the bipartite graph. For that we will create two sets N
and M
, create a first edge from N
to M
. Then we will pick a random isolated node (from either N
or M
) and connected it to a random non-isolated node from the other side. Once we don't have any more isolated node we will have exactly n+m-1 edges, we will thus need to add k-(n+m-1) additional edges to the graph to match the original constraints.
Here is the code corresponding to that algorithm
import networkx as nx
import random
random.seed(0)
def biased_random_connected_bipartite(n, m, k):
G = nx.Graph()
# These will be the two components of the bipartite graph
N = set(range(n))
M = set(range(n, n+m))
G.add_nodes_from(N)
G.add_nodes_from(M)
# Create a first random edge
u = random.choice(tuple(N))
v = random.choice(tuple(M))
G.add_edge(u, v)
isolated_N = set(N-{u})
isolated_M = set(M-{v})
while isolated_N and isolated_M:
# Pick any isolated node:
isolated_nodes = isolated_N|isolated_M
u = random.choice(tuple(isolated_nodes))
# And connected it to the existing connected graph:
if u in isolated_N:
v = random.choice(tuple(M-isolated_M))
G.add_edge(u, v)
isolated_N.remove(u)
else:
v = random.choice(tuple(N-isolated_N))
G.add_edge(u, v)
isolated_M.remove(u)
# Add missing edges
for i in range(k-len(G.edges())):
u = random.choice(tuple(N))
v = random.choice(tuple(M))
G.add_edge(u, v)
return G
B = biased_random_connected_bipartite(5, 6, 11)
But I repeat, this graph is not select uniformly at random in the set of all possible bipartite graphs (with the constraints we defined on n, m and k). As I said it in the other post, this graph will tend to have some nodes with higher degree than other. This is because we connect isolated nodes to the connected component one by one, therefore nodes that have been added sooner in the process will tend to attract more nodes (preferential attachment). I asked the question on cstheory to see if any bright ideas come up.
edit I added another solution than the one presented here, it's a bit better but still not a good one.
Upvotes: 1
Reputation: 23827
SHORT ANSWER
You want to do
B = bipartite.gnmk_random_graph(5,6,10)
top = [node for node in B.nodes() if B.node[node]['bipartite']==0]
bottom = [node for node in B.nodes() if B.node[node]['bipartite']==1]
Explanation
So when you generate this bipartite graph, it is likely to be disconnected. Let's say it has 2 separate components X
and Y
. Both of these components are bipartite.
bipartite.sets(B)
is supposed to determine which sets are the two partitions of B
. But it's going to run into trouble.
Why?
X
can be broken into two partitions X_1
and X_2
and Y
can be broken into Y_1
and Y_2
. What about B
? Let top = X_1 + Y_1
and bottom = X_2 + Y_2
. This is a perfectly legitimate partition. But top = X_1+Y_2
and bottom = X_2+Y_1
is also a perfectly legitimate partition. Which one should it return? It's ambiguous. The algorithm explicitly refuses to make a choice. Instead it gives you an error.
What to do? You could throw out B
if it's disconnected and try again. But you're using B
for something right? Is it reasonable to restrict your attention only to the connected graphs? Maybe, maybe not. That's something you need to figure out. But it is not reasonable to restrict your attention only to the connected graphs if the reason is that disconnected graphs are inconvenient. You seem to hit this error more often than not, so a large fraction of the graphs are disconnected --- you're throwing out a large fraction of the cases. It seems that this is likely to bias the final outcome of whatever you're doing. (similarly, if you take steps to connect your network, you're no longer getting random graphs from the original distribution because well, you've ensured they aren't disconnected, and even worse now - you may not be uniformly sampling from the connected graphs).
So what's a better option? After looking at the source code, I found that this method isn't documented as well as it should be. It turns out that that for
B = bipartite.gnmk_random_graph(5,6,10)
nodes 0
up to 4
(the first five) are in the top, and nodes 5
up to 10
(the next 6) are in the bottom.
Alternately you can get it directly from the data that is encoded in the graph B
(not mentioned in the documentation). Try
B = bipartite.gnmk_random_graph(5,6,10)
B.nodes(data=True)
> NodeDataView({0: {'bipartite': 0}, 1: {'bipartite': 0}, 2: {'bipartite': 0}, 3: {'bipartite': 0}, 4: {'bipartite': 0}, 5: {'bipartite': 1}, 6: {'bipartite': 1}, 7: {'bipartite': 1}, 8: {'bipartite': 1}, 9: {'bipartite': 1}, 10: {'bipartite': 1}})
So it's actually storing which node is in which part. Let's use that (and a list comprehension)
top = [node for node in B.nodes() if B.node[node]['bipartite']==0]
bottom = [node for node in B.nodes() if B.node[node]['bipartite']==1]
Upvotes: 2