suraj jadhav
suraj jadhav

Reputation: 73

Is there a way to draw the network diagram using correlation coefficient as the basis of connections?

I wish to draw network diagram based on the correlations between columns for eg. My data which has 200 rows & 100 columns,sample as below:

A (Zone1) B(Zone1) C (Zone2) D (Zone2) E (Zone3) F (Zone3) G (FInal)
2 23 21 4 4 34 33
4 -2 7 3 10 4 12
23 21 4 4 34 33 12
10 4 12 0 4 -2 7

So the network I want to see is zone wise column names according to their correlation value:

enter image description here

So if there are no good correlation (<=0.3) between 2 columns those should not connected using correlation. Is there an algorithm or way to do this in python?

Upvotes: 1

Views: 1404

Answers (1)

Stef
Stef

Reputation: 15505

You can use the following tools:

import pandas
import itertools
import networkx
import matplotlib.pyplot as plt

data = pandas.read_csv('data.csv')

vertices = data.columns.values.tolist()
edges = [((u,v),data[u].corr(data[v])) for u,v in itertools.combinations(vertices, 2)]
edges = [(u,v,{'weight': c}) for (u,v),c in edges if c >= 0.3]

G = networkx.Graph()
G.add_edges_from(edges)

networkx.draw(G, with_labels=True, font_weight='bold')
plt.show()

Graph with edges if correlation >= 0.3

See also this question:

Upvotes: 2

Related Questions