zzzbbx
zzzbbx

Reputation: 10161

Get node list from random walk in networkX

I am new to networkX. I created a graph as follows:

G = nx.read_edgelist(filename,
                     nodetype=int,
                     delimiter=',',
                     data=(('weight', float),))

where the edges are positive, but do not sum up to one.

Is there a built-in method that makes a random walk of k steps from a certain node and return the node list? If not, what is the easiest way of doing it (nodes can repeat)?

Pseudo-code:

node = random
res = [node]
for i in range(0, k)
    read edge weights from this node
    an edge from this node has probability weight / sum_weights
    node = pick an edge from this node 
    res.append(node)

Upvotes: 11

Views: 19030

Answers (5)

pegah
pegah

Reputation: 859

The most efficient way of doing it is to use the transition matrix of a graph in CSR sparse format and, of course, there is a great package for that: csrgraph (pip install csrgraph). Here is how you can do it:

import csrgraph as cg
import numpy as np

G = cg.csrgraph(G, threads=12) 
node_names = G.names
walks = G.random_walks(walklen=10, # length of the walks
                epochs=100, # how many times to start a walk from each node
                start_nodes=None, # the starting node. It is either a list (e.g., [2,3]) or None. If None it does it on all nodes and returns epochs*G.number_of_nodes() walks
                return_weight=1.,
                neighbor_weight=1.)

The result is an array of size (epochs*number_of_nodes, walklen). More info on the function and its parameters can be found here.

On a graph of 2,130 nodes and 36,560 edges, it took me 0.5 seconds to generate 213,000 paths of length 20 with the snippet above:

>>> array([[   0,    4, 1678, ...,   48,  728,   30],
       [   1,   57,  102, ...,  947,  456,  240],
       [   2,  156,  177, ...,  175, 1363,  539],
       ...,
       [2127, 1655, 1656, ..., 1655, 1656, 2127],
       [2128,    4, 1432, ...,  111,   32,  162],
       [2129,    4,  521, ..., 1280,  180,  608]], dtype=uint32)

walks.shape
>>> (213000, 20)

The node names can be mapped back to their original format using the snippet below or other similar methods:

walks = np.vectorize(lambda x: node_names[x])(walks) # map to original node names

NOTE: This package does much more than just random walks, you might want to check their GitHub repo here

Upvotes: 4

Kerighan
Kerighan

Reputation: 368

DISCLAIMER: I'm the author of the package presented below.

I was surprised to see no library do it efficiently, so I've built and open-sourced a python package just for that. It's written in C++ and use parallelization for maximum speed. It can generate millions of random walks in a few seconds.

import walker

walks = walker.random_walks(G, n_walks=15, walk_len=10)

This will create 15 walks for each node in your graph G of length 10.

If you only wish to create one random walk starting from a single node :

node = 42
walks = walker.random_walks(G, n_walks=1, walk_len=10, start_node=[node])

You can also create node2vec-biased random walks by specifying the p and q arguments.

Installation needs pybind11 to allow C++ bindings :

pip install pybind11
pip install graph-walker

Upvotes: 13

lericson
lericson

Reputation: 1338

To expand on the answer of @AlessioX

Let A be an adjacency matrix, i.e. one where A_ij is 1.0 if there is an edge between vertices i and j, and 0.0 if there is not. (Though note that the below works even for a weighted adjacency matrix.)

>>> A
array([[1, 0, 1],
       [1, 1, 1],
       [0, 0, 1]])

Then we can find the transition matrix T by dividing each row by its sum, simply:

>>> A / A.sum(axis=1, keepdims=True)
array([[0.5       , 0.        , 0.5       ],
       [0.33333333, 0.33333333, 0.33333333],
       [0.        , 0.        , 1.        ]])

Upvotes: 1

AlessioX
AlessioX

Reputation: 3177

The easiest way of doing it is by using the transition matrix T and then using a plain Markovian random walk (in brief, the graph can be considered as a finite-state Markov chain).

Let A and D be the adjacency and degree matrices of a graph G, respectively. The transition matrix T is defined as T = D^(-1) A.
Let p^(0) be the state vector (in brief, the i-th component indicates the probability of being at node i) at the beginning of the walk, the first step (walk) can be evaluated as p^(1) = T p^(0).
Iteratively, the k-th random walk step can be evaluated as p^(k) = T p^(k-1).

In plain Networkx terms...

import networkx
import numpy
# let's generate a graph G
G = networkx.gnp_random_graph(5, 0.5)
# let networkx return the adjacency matrix A
A = networkx.adj_matrix(G)
A = A.todense()
A = numpy.array(A, dtype = numpy.float64)
# let's evaluate the degree matrix D
D = numpy.diag(numpy.sum(A, axis=0))
# ...and the transition matrix T
T = numpy.dot(numpy.linalg.inv(D),A)
# let's define the random walk length, say 10
walkLength = 10
# define the starting node, say the 0-th
p = numpy.array([1, 0, 0, 0, 0]).reshape(-1,1)
visited = list()
for k in range(walkLength):
    # evaluate the next state vector
    p = numpy.dot(T,p)
    # choose the node with higher probability as the visited node
    visited.append(numpy.argmax(p))

Upvotes: 10

Ash
Ash

Reputation: 3550

You can use the adjacency matrix. Then you can normalise it so that the sum of rows equals 1 and each row is the probability distribution of the node jumping to another node. You can also have a jump probability if the walker jumps to a random node.

M = nx.adjacency_matrix(g) #obtain the adj. matrix for the graph
#normalise the adjacency matrix
for i in range(M.shape[1]):
    if (np.sum(M[i]) > 0):
    M[i] = M[i]/np.sum(M[i])
p = generate a random number between 0 and 1
if p < restart_prob:
    #do restart
else:
    #choose next node

Then you can choose a node randomly, then choose the next with probability 1-restart_prob or restart the walker with probability restart_prob.

To understand the algorithm better you can look at how PageRank works.

Upvotes: 0

Related Questions