Reputation: 73
I have a Pandas dataframe (930 rows × 50 columns) that looks like this:
index | Keyword A | Keyword B | Keyword c |
---|---|---|---|
Page 1 | 1 | 3 | 1 |
Page 2 | 4 | 0 | 2 |
Page 3 | 0 | 1 | 1 |
I would like to convert it into an adjacency Matrix / Weighted Graph, where each Keyword is a node. The weight would be the sum of combinations between each keywords.
The result would be something along these lines:
Keyword A | Keyword B | Keyword C | |
---|---|---|---|
Keyword A | 0 | 3 | 8 |
Keyword B | 3 | 0 | 4 |
Keyword C | 8 | 4 | 0 |
Upvotes: 5
Views: 1338
Reputation: 117701
The solution is deceptively simple:
adj = df.T @ df
np.fill_diagonal(adj.values, 0)
E.g.:
>>> df = pd.DataFrame([[1, 1, 3, 1], [2, 4, 0, 2], [3, 0, 1, 1]],
columns=["index", "A", "B", "C"]).set_index("index")
>>> df
A B C
index
1 1 3 1
2 4 0 2
3 0 1 1
>>> adj = df.T @ df
>>> np.fill_diagonal(adj.values, 0)
>>> adj
A B C
A 0 3 9
B 3 0 4
C 9 4 0
Upvotes: 4