Reputation: 143
I'm trying to build a directed graph from a dataframe containing my node and edge data. The graph is drawn, but when I try to assign alpha values or specific width to my edges, I realized there is a mismatch between the data in the dataframe and what is being drawn by networkx (the edges receive the wrong width).
Here is my code:
df = df.drop_duplicates()
df = df.reset_index(drop=True)
edge_list = df.loc[:, ['f', 't', 'v']]
edge_list.to_csv('edges.csv', index=False, header=False)
G = read_edgelist('edges.csv', delimiter=',', create_using=MultiDiGraph(), data=[('weight', float)],
edgetype=float)
pos = nx_pydot.pydot_layout(G, prog='dot')
plot.figure(figsize=(10, 10), dpi=150)
draw_networkx_nodes(G, pos, node_color='skyblue', node_size=5000, nodelist=nodes)
draw_networkx_nodes(G, pos, node_size=2000, node_color='r', nodelist=[address], node_shape='s', edgecolors='black')
draw_networkx_nodes(G, pos, node_size=2000, node_color='r', nodelist=taintsources, node_shape='s', edgecolors='black')
edges = draw_networkx_edges(G, pos, arrows=True, arrowsize=30, arrowstyle='->', edge_color='black')
draw_networkx_labels(G, pos, font_size=9)
# set alphas
i = 0
for a in df['v']:
edges[i].set_alpha(a)
i += 1
plot.show()
Now, the data in the dataframe (df after reindexing and dropping) is as follows:
f t v l
0 0xdbd838... 0x3f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be 4.999775 0xaeaac2670575ca1602b598401c43e85513edf7e99974...
1 0xe6c334... 0x3f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be 1.507629 0xcad1b3c29d03dc55234334d906e61dde140b91985a13...
2 0xec7bcd... 0x3f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be 1.428406 0x419685acc8b968b48536d190d2c50dffc7fda8fb8579...
3 0x1fe81d... 0x3f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be 2.973072 0xac8d0f5c672b5e27dad3687606bc2aedffc3611fa2f8...
4 0xe6c334... 0x3f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be 0.714586 0xaa27468c07ba13b185f83e71934ab0e0aa684570faf6...
5 0xdbd838... 0x3f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be 0.714511 0x56a0783f46e8176df3b5833480e6565e3110e8ba952d...
6 0xa92189... 0xdbd838... 5.000000 0xda791bbba0fd49e733970ad9b48d6c1fff02d5e93b1b...
7 0x523564... 0xa92189... 5.000255 0x0c5e6548f5285520fc03a7ebf5f636e9475c1f678db8...
8 0x5abf99... 0x523564... 10.714286 0x24699357078cc1cfe6b3d57a67ffab18ef132dc86996...
9 0xc50be6... 0xe6c334... 1.507929 0xe6bf05d6c99db12d1735a62f2c3a9df37941025de2d6...
10 0x523564... 0xc50be6... 1.508184 0x62c7a7793294ac46094ceb0c580fcc8593575a8c6fbc...
11 0x329fda... 0x1fe81d... 2.977357 0x66bd13433f5ab207aced390ba2c915f913556105f010...
12 0x9dc588... 0x329fda... 2.977582 0x09714e7e2dedec960801b74d4838aac99ded96bbb29e...
13 0x523564... 0x9dc588... 2.977837 0xe83750f61b3bde48c39ce384d3b480e393e39c78d2fb...
14 0x68a419... 0xec7bcd... 1.428571 0xecba439590735ca8cdd69ba5669c8fdcbda68eefeee1...
15 0xfb08f9... 0x68a419... 1.428826 0x6c5d9dc5af2074b1ff81ef7f700df837693d6fba21b5...
16 0x523564... 0xfb08f9... 1.494462 0x5d404be34d7108a3b029340f39fe2e62e6983dba858f...
There are two problems now: df contains 17 entries, whereas the graph only contains 15(?) edges and therefore the weights are not assigned to the correct edges. The resulting graph (plot.show()) is having some clearly wrong assignments when it comes to the width of the arrows (widest arrow at wrong edge). I guess some edges are being merged in the graph and that results in the mismatch. How can I prevent this? How do I do this right? I'm really thankful for your inputs! :)
Edit1: Here is my data used in this code (as JSON string):
address = "0x3f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be"
taintsources = ["0x5abf99..."]
nodes = ["0x9dc588...", "0xec7bcd...", "0xdbd838...", "0xc50be6...", "0xa92189...", "0x523564...", "0x1fe81d...", "0xe6c334...", "0x68a419...", "0xfb08f9...", "0x329fda..."]
df (after dropping and resetting the index):
https://pastebin.com/vc8L665V (alpha scaling)
https://pastebin.com/JyDLwdNJ (width scaling)
Edit2: Code adjustments for more context. Also adjusted the df-values, as the v-column is now scaled between 0.1 and 1.0 (to match alpha channels) instead of scaled from 1-10 (when previously trying to set a different arrow width per edge).
Edit3: added image:
As it is visible, the edge between 0x5abf99... and 0x523564... does not have a solid connection, but according to the dataframe, it should.
Upvotes: 1
Views: 303
Reputation: 8811
So the main culprit are the following two rows
4 0xe6c334... 0x3f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be 0.714586 0xaa27468c07ba13b185f83e71934ab0e0aa684570faf6...
5 0xdbd838... 0x3f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be 0.714511 0x56a0783f46e8176df3b5833480e6565e3110e8ba952d...
These are duplicate edges with the only difference being the value at column v
So you need to create a multiDigraph() to account for multiple edges with different attributes.
Also, I was not sure if from_pandas_edgelist supports reading edge attributes, so I did the following changes to your code by using read_edgelist and this accounts for multiple edges
df = df.drop_duplicates()
df = df.reset_index(drop=True)
#Convert the data to node1, node2, attribute_data format
edge_list = df.loc[:,['f','t','v']]
edge_list.to_csv('edges.csv', index=False,header=False)
G = nx.read_edgelist('edges.csv',delimiter=',',create_using=nx.MultiDiGraph(),data=[('weight',float)],edgetype=float)
print len(G.edges())
#output : 17
Upvotes: 0