Reputation: 133
Is there any package in python regarding Community Detection that I can simply use?
Here is my dataset, just a csv file containing some relationship records.
A,B
B,E
K,L
Q,W
P,Y
W,P
...
Each line represents one person has relationships with the other, for example, 'A,B'
represents A has relationship with B. And the reversible relationships do not appear. For example, 'B,A'
will never appear in this dataset since there already exists 'A,B'
.
So in this example, all the nodes that can be linked together are considered one community. the relationships after sorting are like this:
Community 1: A--B--E
Community 2: K--L
Community 3: Q--W--P--Y
I want the output could be like this, just output all the communities and all the members in each community.
I know this kind of funtion can easily achieved by some graph database such as neo4j. But how to implement it in python? Is there any package such as some packages in sklearn which can achieve this?
Upvotes: 2
Views: 401
Reputation: 262634
The is a graph problem.
Assuming this DataFrame as input:
source target
0 A B
1 B E
2 K L
3 Q W
4 P Y
5 W P
You graph is:
And you want to identify the subgroups to form a Series/DataFrame.
You can use networkx.connected_components
:
# pip install networkx
import networkx as nx
G = nx.from_pandas_edgelist(df, source='source', target='target')
pd.Series({f'Community {i}': g
for i,g in enumerate(nx.connected_components(G), start=1)})
output:
Community 1 {A, E, B}
Community 2 {L, K}
Community 3 {P, W, Y, Q}
dtype: object
Upvotes: 2