Peaceful
Peaceful

Reputation: 5470

Using parallel python to parallelize loop over nodes in networkx

I am doing some complex calculation in networkx. The calculation involves calculating some quantity again and again for each node in the network. Just as an example of such a calculation, suppose we want to calculate the average degree of neighbors of each node in the network and save that value as a node attribute. The following snippet works for me:

import networkx as nx

G = nx.erdos_renyi_graph(10, 0.5)

def ave_nbr_deg(node):
    value = 0.
    for nbr in G.neighbors(node):
        value += G.degree(nbr)
    G.node[node]['ave_nbr_deg'] = value/len(G.neighbors(node))

for node in G.nodes():
    ave_nbr_deg(node)

print G.nodes(data = True)

This gives me:

[(0, {'ave_nbr_deg': 5.4}), (1, {'ave_nbr_deg': 5.0}), (2, {'ave_nbr_deg': 5.333333333333333}), (3, {'ave_nbr_deg': 5.2}), (4, {'ave_nbr_deg': 5.6}), (5, {'ave_nbr_deg': 5.6}), (6, {'ave_nbr_deg': 5.2}), (7, {'ave_nbr_deg': 5.25}), (8, {'ave_nbr_deg': 5.5}), (9, {'ave_nbr_deg': 5.5})]

Here itself I have a small doubt. The object G is created outside the function ave_nbr_deg and I have no idea how the function has its information even though I haven't declared it to be global.

Now, I want to use parallel python module to use all cores on my system for this calculation. After making some change to above code, I get the following code:

import networkx as nx
import pp

G = nx.erdos_renyi_graph(10, 0.5)

def ave_nbr_deg(node):
    value = 0.
    for nbr in G.neighbors(node):
        value += G.degree(nbr)
    G.node[node]['ave_nbr_deg'] = value/len(G.neighbors(node))

job_server = pp.Server(ppservers = ())

print "Starting pp with", job_server.get_ncpus(), "workers"

for node in G.nodes():
    job_server.submit(ave_nbr_deg, args = (node,))()

print G.nodes(data = True)

But it returns following error:

An error has occured during the function execution
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/ppworker.py", line 90, in run
    __result = __f(*__args)
  File "<string>", line 5, in ave_nbr_deg
NameError: global name 'G' is not defined

I tried all sorts of things including module name nx in submit and so on. However, I am not getting what is the exact problem. Unfortunately, the documentation of smp is too short to solve this problem. I would be extremely grateful if anybody here could help me with this.

Thanks in advance

Upvotes: 3

Views: 3674

Answers (1)

user1337732
user1337732

Reputation: 68

To answer your first question: Python first looks for variables in the local namespace. If it does not find them there, then it will move up to the parent namespace and look for it there. That's how it finds G, even though G is not locally declared. Here's more information on variable scopes: Python scoping

There are probably two solutions:

1 - pass G as an argument to the function ave_nbr_deg, e.g.

for node in G.nodes():
    job_server.submit(ave_nbr_deg, args = (node,G))()

2 - declare G as a global variable in the submit args (from the Parallel Python documentation)

   submit(self, func, args=(), depfuncs=(), modules=(), 
       callback=None, callbackargs=(), group='default', globals=None)
Submits function to the execution queue

func - function to be executed
args - tuple with arguments of the 'func'
depfuncs - tuple with functions which might be called from 'func'
modules - tuple with module names to import
callback - callback function which will be called with argument 
        list equal to callbackargs+(result,) 
        as soon as calculation is done
callbackargs - additional arguments for callback function
group - job group, is used when wait(group) is called to wait for
jobs in a given group to finish
globals - dictionary from which all modules, functions and classes
will be imported, for instance: globals=globals()

In this instance, calling the following should work:

for node in G.nodes():
    job_server.submit(ave_nbr_deg, args = (node,G), globals=globals())()

Upvotes: 1

Related Questions