Reputation: 73
I wish to draw network diagram based on the correlations between columns for eg. My data which has 200 rows & 100 columns,sample as below:
A (Zone1) | B(Zone1) | C (Zone2) | D (Zone2) | E (Zone3) | F (Zone3) | G (FInal) |
---|---|---|---|---|---|---|
2 | 23 | 21 | 4 | 4 | 34 | 33 |
4 | -2 | 7 | 3 | 10 | 4 | 12 |
23 | 21 | 4 | 4 | 34 | 33 | 12 |
10 | 4 | 12 | 0 | 4 | -2 | 7 |
So the network I want to see is zone wise column names according to their correlation value:
So if there are no good correlation (<=0.3) between 2 columns those should not connected using correlation. Is there an algorithm or way to do this in python?
Upvotes: 1
Views: 1404
Reputation: 15505
You can use the following tools:
pandas.read_csv
to read the data from the csv file;.corr
to get the pairwise correlations;networkx
to build the graph.import pandas
import itertools
import networkx
import matplotlib.pyplot as plt
data = pandas.read_csv('data.csv')
vertices = data.columns.values.tolist()
edges = [((u,v),data[u].corr(data[v])) for u,v in itertools.combinations(vertices, 2)]
edges = [(u,v,{'weight': c}) for (u,v),c in edges if c >= 0.3]
G = networkx.Graph()
G.add_edges_from(edges)
networkx.draw(G, with_labels=True, font_weight='bold')
plt.show()
See also this question:
Upvotes: 2