Reputation: 3910
I have ~10k publications that have inbound or/and outbound citations.
Data comes in the following format (two entries as an example):
# each 'number' is a 'paper_id'
citations = {
'157553241': {
'inbound_citations': [],
'outbound_citations': [
'141919793',
'158546657',
'156580052',
'159778536',
'157021328',
'158546657',
'157021328',
'141919793',
'153005744',
'159778536',
'112335878',
'156580052'
]
},
'54196724': {
'inbound_citations': ['204753337', '55910675'],
'outbound_citations': ['153776751', '141060228', '33718066', '158233543']
},
}
How do I transform this format into something I could feed to networkx
?
I'd like to find the most 'central' papers & discover some cliques (to begin with).
I've tried
G = nx.DiGraph(citations)
but I don't think it works like that...
Upvotes: 1
Views: 1443
Reputation: 120559
You need to build a list of edges like this:
import networkx as nx
import matplotlib.pyplot as plt
edges = []
for node in citations:
for parent in citations[node]['inbound_citations']:
edges.append((parent, node))
for child in citations[node]['outbound_citations']:
edges.append((node, child))
G = nx.DiGraph()
G.add_edges_from(edges)
nx.draw(G, with_labels=True)
plt.show()
Upvotes: 2