Reputation: 1074
I have the following code that I attempt to use to generate a network matrix. With this matrix, I would like to find the 20 highest-weighted edges that ARE NOT on the diagnal (i.e. i!=j
in the matrix). I also would like to get the names of the nodes (in pairs) composed of these edges.
import heapq
def mapper_network(self, _, info):
G = nx.Graph() #create a graph
for i in range(len(info)):
edge_from = info[0] # edge from
edge_to = info[1] # edge to
weight = info[2] # edge weight
G.add_edge(edge_from, edge_to, weight=weight) #insert the edge to the graph
A = nx.adjacency_matrix(G) # create an adjacency matrix
A_square = A * A # find the product of the matrix
print heapq.nlargest(20, A_square) # to print out the 20 highest weighted edges
With this code, however, I failed to generate the 20 most weighted edges. I obtain raise ValueError("The truth value of an array with more than one "
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
Instead, with this
print heapq.nlargest(20, range(len(A_square)), A_square.take)
It gives me:
raise TypeError("sparse matrix length is ambiguous; use getnnz()"
TypeError: sparse matrix length is ambiguous; use getnnz() or shape[0]
With
def mapper_network(self, _, info):
G = nx.Graph()
for i in range(len(info)):
edge_from = info[0]
edge_to = info[1]
weight = info[2]
G.add_edge(edge_from, edge_to, weight=weight)
A = nx.adjacency_matrix(G)
A_square = A * A #can print (A_square.todense())
weight = nx.get_edge_attributes(A_square, weight)
edges = A_square.edges(data = True)
s = sorted(G.edges(data=True), key=lambda (source, target, data): data['weight'])
print s
I received
File "/tmp/MRQ7_trevor.vagrant.20160814.040827.770006/job_local_dir/1/mapper/0/mrjob.tar.gz/mrjob/job.py", line 433, in run
mr_job.execute()
File "/tmp/MRQ7_trevor.vagrant.20160814.040827.770006/job_local_dir/1/mapper/0/mrjob.tar.gz/mrjob/job.py", line 442, in execute
self.run_mapper(self.options.step_num)
File "/tmp/MRQ7_trevor.vagrant.20160814.040827.770006/job_local_dir/1/mapper/0/mrjob.tar.gz/mrjob/job.py", line 507, in run_mapper
for out_key, out_value in mapper(key, value) or ():
File "MRQ7_trevor.py", line 90, in mapper_network
weight = nx.get_edge_attributes(A_square, weight)
File "/home/vagrant/anaconda/lib/python2.7/site-packages/networkx/classes/function.py", line 428, in get_edge_attributes
if G.is_multigraph():
File "/home/vagrant/anaconda/lib/python2.7/site-packages/scipy/sparse/base.py", line 499, in __getattr__
raise AttributeError(attr + " not found")
AttributeError: is_multigraph not found
Could someone help me solve this question? Thank you very much!
Upvotes: 1
Views: 784
Reputation: 9753
The issue with this line:
heapq.nlargest(20, A_square)
Is that nlargest
does not expect an iterable of iterables, but an interable of numbers.
So you could do this instead:
heapq.nlargest(20, itertools.chain.from_iterable(A_square))
itertools.chain.iterables
takes an iterable of iterables, and creates a new iterable with the content of all the inner iterables.
However, that won't entierly solve your initial problem, for two reasons:
Why do you take the square of the adjacency matrix? Doing so would only give you the highest weighted sum of paths of length 2 in the graph, which is quite different from what you want. Just use the adjacency matrix instead.
Nowhere in your code you remove the diagonal. You could do it like this: for n in G.number_of_nodes(): A[n][n] = 0
Upvotes: 1