Reputation: 189
I have a graph containing many genes and their interactions. I am interested in finding a subgraph with the maximum no .of a specific set of genes say, A, B, C, D, E in the graph.
Tried BFS algorithm and also connected components. But do not know how to find the subgraph of my genes of interest.
def bfs(G, gene, n):
"""
Using breadth-first search
returns a graph of breadth n starting at source gene.
"""
S = nx.algorithms.traversal.breadth_first_search.bfs_tree(G, source=gene, depth_limit=n)
return S
Given a graph G(V,E) with V vertices and E edges , I want to find a subgraph G'(v,e) where v is a subset of V, such that G' contains maximum of my nodes on interest.
Upvotes: 2
Views: 1989
Reputation: 23827
edit While I think my original code (now at the bottom) was good, I think you can do better using node_connected_component(G,u)
which returns the set of nodes in the same component as u
.
Let me explain the code below. First, I'm going to step through the interesting genes. With the first one, I look for the component of G
in which it sits, and then find all of the other interesting genes that are in the same component. Then when I look at each subsequent interesting gene, I make sure I haven't already encountered it in the same component as another. If it's a new component, I find all of the other interesting genes in the same component. At the end, I find the component that had the most interesting genes.
component_count = {}
seen = {}
for source in interesting_genes:
if source not in seen:
reachable_nodes = nx.node_connected_component(G, source)
reachable_nodes_of_interest = [target for target in interesting_genes if target in reachable_nodes]
for node in reachable_nodes_of_interest:
seen[node] = True
component_count = len(reachable_nodes_of_interest)
source = max(component_count, key=component_count.get) #finds the node with the largest component_count
Gprime = G.subgraph(nx.node_connected_component(G, source))
Take a look at the documentation for single_source_shortest_path_length
.
seen = {source:False for source in interesting_genes} #if we've found the component of a node, no need to recalculate it.
component_count = {} #will count how many other interesting genes there are in a component
for source in interesting_genes:
if not seen[source]:
reachable_nodes_dict = nx.single_source_shortest_path_length(G, node)
reachable_nodes_of_interest = [target for target in interesting_genes if target in reachable_nodes_dict]
for target in reachable_nodes_of_interest:
seen[target] = True
component_count[source] = len(reachable_nodes_of_interest)
source = max(component_count, key=component_count.get) #finds the node with the largest component_count
Gprime = G.subgraph(nx.node_connected_component(G, source))
Upvotes: 3