monadoboi
monadoboi

Reputation: 1811

NetworkX - generating a random connected bipartite graph

I'm using NetworkX to generate a bipartite graph using either nx.bipartite.random_graph or nx.bipartite.gnmk_random_graph, as follows:

B = bipartite.gnmk_random_graph(5,6,10)
bottom_nodes, top_nodes = bipartite.sets(B)

However, I get an error:

networkx.exception.AmbiguousSolution: Disconnected graph: Ambiguous solution for bipartite sets.

It is just a single line, so I'm not sure how I could be doing this wrong and why their package would be returning (what I assume is) an invalid bipartite graph.

Thanks.

EDIT: I just realised that I need to specify a minimum number of edges/probability for the third argument.

E.g. bipartite.random_graph(5,6,0.6) and having p>0.5 gets rid of the error. Similarly, bipartite.gnmk_random_graph(5,6,11) where k>n+m. I didn't realise this was the case, as I assumed if the number of edges was lower than required to connect every vertex there would just be some floating vertices.

Thanks for your help!

Upvotes: 5

Views: 4133

Answers (2)

cglacet
cglacet

Reputation: 10912

Considering that you have a {5, 6} bipartite graph with only 10 edges, its very likely that you graph will be disconnected (it is so sparse that you even have a high probability of having isolated nodes).

import networkx as nx
import random

random.seed(0)

B = nx.bipartite.gnmk_random_graph(5,6,10)
isolated_nodes = set(B.nodes())
for (u, v) in B.edges():
  isolated_nodes -= {u}
  isolated_nodes -= {v}
print(isolated_nodes)

Will show you that node with id=1 is isolated. What you can do to make your graph connected is to only keep its largest connected component:

import networkx as nx
import random

random.seed(0)

B = nx.bipartite.gnmk_random_graph(5,6,11)
components = sorted(nx.connected_components(B), key=len, reverse=True)
largest_component = components[0]
C = B.subgraph(largest_component)

Which will here only remove node 1 (an isolated node).

Now the only question is "how random this new graph is". In other words, does it pick any graph in the set of random connected bipartite graphs with 5-6 nodes and 10 edges with equal probability. For now I'm not sure, but it looks decent I think.

Of course what you suggest (picking a graph until its connected) will be ok, but it can be costly (depending on the 3 parameters of course).

Edit I'm dumb, this can't be ok as the new graph doesn't have the right number of nodes/edges. But there should be a better solution than just retry until you get a good graph. Hmm that's interesting ...

2nd edit Maybe this answer could help in finding a good solution to this problem.

3rd edit and a suggestion

As you have noticed in the question I linked, the accepted answer is not really correct as the generated graph is not selected uniformly at random in the set of expected graphs. We can do something a bit similar here to have a first decent solution. The idea is to first create a connected bipartite graph with the minimum number of edges by iteratively picking isolated nodes and connected them to the other side of the bipartite graph. For that we will create two sets N and M, create a first edge from N to M. Then we will pick a random isolated node (from either N or M) and connected it to a random non-isolated node from the other side. Once we don't have any more isolated node we will have exactly n+m-1 edges, we will thus need to add k-(n+m-1) additional edges to the graph to match the original constraints.

Here is the code corresponding to that algorithm

import networkx as nx
import random

random.seed(0)

def biased_random_connected_bipartite(n, m, k):
  G = nx.Graph()

  # These will be the two components of the bipartite graph
  N = set(range(n)) 
  M = set(range(n, n+m))
  G.add_nodes_from(N)
  G.add_nodes_from(M)

  # Create a first random edge 
  u = random.choice(tuple(N))
  v = random.choice(tuple(M))
  G.add_edge(u, v)

  isolated_N = set(N-{u})
  isolated_M = set(M-{v})
  while isolated_N and isolated_M:
    # Pick any isolated node:
    isolated_nodes = isolated_N|isolated_M
    u = random.choice(tuple(isolated_nodes))

    # And connected it to the existing connected graph:
    if u in isolated_N:
      v = random.choice(tuple(M-isolated_M))
      G.add_edge(u, v)
      isolated_N.remove(u)
    else:
      v = random.choice(tuple(N-isolated_N))
      G.add_edge(u, v)
      isolated_M.remove(u)

  # Add missing edges
  for i in range(k-len(G.edges())):
    u = random.choice(tuple(N))
    v = random.choice(tuple(M))
    G.add_edge(u, v)

  return G

B = biased_random_connected_bipartite(5, 6, 11)

But I repeat, this graph is not select uniformly at random in the set of all possible bipartite graphs (with the constraints we defined on n, m and k). As I said it in the other post, this graph will tend to have some nodes with higher degree than other. This is because we connect isolated nodes to the connected component one by one, therefore nodes that have been added sooner in the process will tend to attract more nodes (preferential attachment). I asked the question on cstheory to see if any bright ideas come up.

edit I added another solution than the one presented here, it's a bit better but still not a good one.

Upvotes: 1

Joel
Joel

Reputation: 23827

SHORT ANSWER

You want to do

B = bipartite.gnmk_random_graph(5,6,10)
top = [node for node in B.nodes() if B.node[node]['bipartite']==0]
bottom = [node for node in B.nodes() if B.node[node]['bipartite']==1]

Explanation

So when you generate this bipartite graph, it is likely to be disconnected. Let's say it has 2 separate components X and Y. Both of these components are bipartite.

bipartite.sets(B) is supposed to determine which sets are the two partitions of B. But it's going to run into trouble.

Why?

X can be broken into two partitions X_1 and X_2 and Y can be broken into Y_1 and Y_2. What about B? Let top = X_1 + Y_1 and bottom = X_2 + Y_2. This is a perfectly legitimate partition. But top = X_1+Y_2 and bottom = X_2+Y_1 is also a perfectly legitimate partition. Which one should it return? It's ambiguous. The algorithm explicitly refuses to make a choice. Instead it gives you an error.

What to do? You could throw out B if it's disconnected and try again. But you're using B for something right? Is it reasonable to restrict your attention only to the connected graphs? Maybe, maybe not. That's something you need to figure out. But it is not reasonable to restrict your attention only to the connected graphs if the reason is that disconnected graphs are inconvenient. You seem to hit this error more often than not, so a large fraction of the graphs are disconnected --- you're throwing out a large fraction of the cases. It seems that this is likely to bias the final outcome of whatever you're doing. (similarly, if you take steps to connect your network, you're no longer getting random graphs from the original distribution because well, you've ensured they aren't disconnected, and even worse now - you may not be uniformly sampling from the connected graphs).

So what's a better option? After looking at the source code, I found that this method isn't documented as well as it should be. It turns out that that for

B = bipartite.gnmk_random_graph(5,6,10)

nodes 0 up to 4 (the first five) are in the top, and nodes 5 up to 10 (the next 6) are in the bottom.

Alternately you can get it directly from the data that is encoded in the graph B (not mentioned in the documentation). Try

B = bipartite.gnmk_random_graph(5,6,10)
B.nodes(data=True)
> NodeDataView({0: {'bipartite': 0}, 1: {'bipartite': 0}, 2: {'bipartite': 0}, 3: {'bipartite': 0}, 4: {'bipartite': 0}, 5: {'bipartite': 1}, 6: {'bipartite': 1}, 7: {'bipartite': 1}, 8: {'bipartite': 1}, 9: {'bipartite': 1}, 10: {'bipartite': 1}})

So it's actually storing which node is in which part. Let's use that (and a list comprehension)

top = [node for node in B.nodes() if B.node[node]['bipartite']==0]
bottom = [node for node in B.nodes() if B.node[node]['bipartite']==1]

Upvotes: 2

Related Questions