Reputation: 23
I have a matrix in this form:
In [1]: df = pd.DataFrame([[0, 0, 1, 1, 1], [0, 0, 0, 1, 0], [1, 0, 0, 0, 0], [1, 1, 0, 0, 1], [1, 0, 0, 1, 0]], columns = list('ABCDE'), index = list('ABCDE'))
In [2]:df
Out[2]:
A B C D E
A 0 0 1 1 1
B 0 0 0 1 0
C 1 0 0 0 0
D 1 1 0 0 1
E 1 0 0 1 0
The number 1 represents the connection between two elements. In this case 'A' is connected to 'D' and 'E' and 'D' are connected to 'E', forming a closed connection formed by three elements. I look for the greatest number of elements connected to each other, in this case 'A', 'D', 'E'. I could use for loops but it gets too slow for a 300x300 matrix. How can I solve?
Update 1:
df = pd.DataFrame([[0, 0, 1, 0, 0, 1], [0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 1, 0], [0, 0, 0, 0, 1, 1], [0, 0, 1, 1, 0, 0], [1, 0, 0, 0, 0, 0]], columns = list('ABCDEF'), index = list('ABCDEF'))
In this case the solution is the longest cycle ['D', 'F', 'A', 'C', 'E'].
Is it possible to have only full connected elements such as, in the case presented in the example, 'C', 'D', 'E'?
Upvotes: 2
Views: 234
Reputation: 88226
What you appear to be looking for IIUC, from a graph theory point of view, are the graph cycles. You can find the cycles by generating a graph from the adjacency matrix using NetworkX
:
import networkx as nx
G = nx.from_pandas_adjacency(df, create_using=nx.DiGraph)
nx.draw(G, with_labels=True, node_color='lightblue')
You can find the graph cycles via nx.simple_cycles
, and from there easily obtain the longest:
max(nx.simple_cycles(G), key=len)
# ['E', 'D', 'A']
Update -
If you want only fully connected elements in those "cycles", what you want are the graph cliques. You'll first have to convert to undirected, since nx.find_cliques
isn't defined for directed graphs, and then find the max
using len
as a key function:
H = G.to_undirected()
nx.draw(H, with_labels=True, node_color='lightblue')
max(nx.find_cliques(H), key=len)
# ['D', 'E', 'C']
Upvotes: 2