yearntolearn
yearntolearn

Reputation: 1074

How to find the n-largest edge weights in a network matrix (networkx)?

I have the following code that I attempt to use to generate a network matrix. With this matrix, I would like to find the 20 highest-weighted edges that ARE NOT on the diagnal (i.e. i!=j in the matrix). I also would like to get the names of the nodes (in pairs) composed of these edges.

import heapq
def mapper_network(self, _, info):
    G = nx.Graph()  #create a graph
    for i in range(len(info)):   
        edge_from = info[0]  # edge from
        edge_to = info[1]    # edge to
        weight = info[2]     # edge weight
        G.add_edge(edge_from, edge_to, weight=weight) #insert the edge to the graph
    A = nx.adjacency_matrix(G)  # create an adjacency matrix
    A_square = A * A  # find the product of the matrix
    print heapq.nlargest(20, A_square) # to print out the 20 highest weighted edges

With this code, however, I failed to generate the 20 most weighted edges. I obtain raise ValueError("The truth value of an array with more than one " ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().

Instead, with this

    print heapq.nlargest(20, range(len(A_square)), A_square.take)

It gives me:

    raise TypeError("sparse matrix length is ambiguous; use getnnz()"
    TypeError: sparse matrix length is ambiguous; use getnnz() or shape[0]

With

    def mapper_network(self, _, info):
    G = nx.Graph()
    for i in range(len(info)):
        edge_from = info[0]
        edge_to = info[1]
        weight = info[2]
        G.add_edge(edge_from, edge_to, weight=weight)
    A = nx.adjacency_matrix(G)
    A_square = A * A #can print (A_square.todense())
    weight = nx.get_edge_attributes(A_square, weight)
    edges = A_square.edges(data = True)
    s = sorted(G.edges(data=True), key=lambda (source, target, data): data['weight'])
    print s

I received

  File         "/tmp/MRQ7_trevor.vagrant.20160814.040827.770006/job_local_dir/1/mapper/0/mrjob.tar.gz/mrjob/job.py", line 433, in run
mr_job.execute()
 File "/tmp/MRQ7_trevor.vagrant.20160814.040827.770006/job_local_dir/1/mapper/0/mrjob.tar.gz/mrjob/job.py", line 442, in execute
self.run_mapper(self.options.step_num)
 File "/tmp/MRQ7_trevor.vagrant.20160814.040827.770006/job_local_dir/1/mapper/0/mrjob.tar.gz/mrjob/job.py", line 507, in run_mapper
for out_key, out_value in mapper(key, value) or ():
File "MRQ7_trevor.py", line 90, in mapper_network
weight = nx.get_edge_attributes(A_square, weight)
File "/home/vagrant/anaconda/lib/python2.7/site-packages/networkx/classes/function.py", line 428, in get_edge_attributes
if G.is_multigraph():
File "/home/vagrant/anaconda/lib/python2.7/site-packages/scipy/sparse/base.py", line 499, in __getattr__
raise AttributeError(attr + " not found")

AttributeError: is_multigraph not found

Could someone help me solve this question? Thank you very much!

Upvotes: 1

Views: 784

Answers (1)

Valentin Lorentz
Valentin Lorentz

Reputation: 9753

The issue with this line:

heapq.nlargest(20, A_square)

Is that nlargest does not expect an iterable of iterables, but an interable of numbers.

So you could do this instead:

heapq.nlargest(20, itertools.chain.from_iterable(A_square))

itertools.chain.iterables takes an iterable of iterables, and creates a new iterable with the content of all the inner iterables.


However, that won't entierly solve your initial problem, for two reasons:

  1. Why do you take the square of the adjacency matrix? Doing so would only give you the highest weighted sum of paths of length 2 in the graph, which is quite different from what you want. Just use the adjacency matrix instead.

  2. Nowhere in your code you remove the diagonal. You could do it like this: for n in G.number_of_nodes(): A[n][n] = 0

Upvotes: 1

Related Questions