Simd
Simd

Reputation: 21343

How to read a sparse directed graph in scipy

I have a sparse weighted directed graph represented in a file with each line in the format

from to weight

I would like to read it into scipy's compressed sparse format so I can perform simple traversal and graph algorithms on it (or in fact any memory efficient representation). However, given a node I would like to be able to list all its outgoing edges in order of weight quickly without having to sort them each time. It's ok to sort each one once of course.

Is it possible to do this in scipy or using any other python package?

Upvotes: 2

Views: 2130

Answers (1)

Jaime
Jaime

Reputation: 67457

You can load your data with something like this:

import numpy as np
import scipy.sparse as sps

data = np.genfromtxt('data.txt', dtype=[('from', np.intp),
                                        ('to', np.intp),
                                        ('weight', np.float)])

If you want to store your weights in a sparse matrix graph, where graph[i, j] is the weight from node i to node j, you could do:

graph = sps.csr_matrix((data['weight'], (data['from'], data['to'])))

To have a sorted list of outgoing nodes, I would use a dictionary, where sorted_to is an array of outgoing nodes sorted by weight. It's a little hacky, and dependent on the CSR sparse matrix format, but you could do:

graph = sps.rand(10, 10, density=0.1, format='csr')
data, indptr, indices = graph.data, graph.indptr, graph.indices
non_empty_rows, = np.nonzero(np.diff(graph.indptr))
sorted_out = {}
for j in non_empty_rows:
    weight_slice = data[indptr[j]:indptr[j+1]]
    out_slice = indices[indptr[j]:indptr[j+1]]
    sorted_out[j] = out_slice[np.argsort(weight_slice)]

With a simple made up example:

>>> graph = sps.rand(5, 5, density=0.2, format='csr')
>>> graph.toarray()
array([[ 0.88968871,  0.        ,  0.        ,  0.80773932,  0.        ],
       [ 0.        ,  0.        ,  0.8921645 ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.18552664,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.22945956]])
>>> non_empty_rows
array([0, 1, 3, 4], dtype=int64)
>>> sorted_out
{0: array([3, 0]), 1: array([2]), 3: array([0]), 4: array([4])}

Upvotes: 2

Related Questions