gaussit
gaussit

Reputation: 85

Scipy sparse matrix from edge list

How to convert an edge list (data) to a python scipy sparse matrix to get this result:

sparse matrix in R

Dataset (where 'agn' is node category one and 'fct' is node category two):

data['agn'].tolist()
['p1', 'p1', 'p1', 'p1', 'p1', 'p2', 'p2', 'p2', 'p2', 'p3', 'p3', 'p3', 'p4', 'p4', 'p5']

data['fct'].tolist()
['f1', 'f2', 'f3', 'f4', 'f5', 'f3', 'f4', 'f5', 'f6', 'f5', 'f6', 'f7', 'f7', 'f8', 'f9']

(not working) python code:

from scipy.sparse import csr_matrix, coo_matrix

csr_matrix((data_sub['agn'].values, data['fct'].values), 
                    shape=(len(set(data['agn'].values)), len(set(data_sub['fct'].values))))

-> Error: "TypeError: invalid input format" Do I really need three arrays to construct the matrix, like the examples in the scipy csr documentation do suggest (can only use two links, sorry!)?

(working) R code used to construct the matrix with only two vectors:

library(Matrix)

grph_tim <- sparseMatrix(i = as.numeric(data$agn), 
                     j = as.numeric(data$fct),  
                     dims = c(length(levels(data$agn)),
                              length(levels(data$fct))),
                     dimnames = list(levels(data$agn),
                                     levels(data$fct)))

EDIT: It finally worked after I modified the code from here and added the needed array:

import numpy as np
import pandas as pd
import scipy.sparse as ss

def read_data_file_as_coo_matrix(filename='edges.txt'):
    "Read data file and return sparse matrix in coordinate format."

    # if the nodes are integers, use 'dtype = np.uint32'
    data = pd.read_csv(filename, sep = '\t', encoding = 'utf-8')

    # where 'rows' is node category one and 'cols' node category 2
    rows = data['agn']  # Not a copy, just a reference.
    cols = data['fct']

    # crucial third array in python, which can be left out in r
    ones = np.ones(len(rows), np.uint32)
    matrix = ss.coo_matrix((ones, (rows, cols)))
    return matrix

Additionally, I converted the string names of the nodes to integers. Thus data['agn'] becomes [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4] and data['fct'] becomes [0, 1, 2, 3, 4, 2, 3, 4, 5, 4, 5, 6, 6, 7, 8].

I get this sparse matrix:

(0, 0) 1 (0, 1) 1 (0, 2) 1 (0, 3) 1 (0, 4) 1 (1, 2) 1 (1, 3) 1 (1, 4) 1 (1, 5) 1 (2, 4) 1 (2, 5) 1 (2, 6) 1 (3, 6) 1 (3, 7) 1 (4, 8) 1

Upvotes: 3

Views: 3117

Answers (1)

gaussit
gaussit

Reputation: 85

It finally worked after I modified the code from here and added the needed array:

import numpy as np
import pandas as pd
import scipy.sparse as ss

def read_data_file_as_coo_matrix(filename='edges.txt'):
    "Read data file and return sparse matrix in coordinate format."

    # if the nodes are integers, use 'dtype = np.uint32'
    data = pd.read_csv(filename, sep = '\t', encoding = 'utf-8')

    # where 'rows' is node category one and 'cols' node category 2
    rows = data['agn']  # Not a copy, just a reference.
    cols = data['fct']

    # crucial third array in python, which can be left out in r
    ones = np.ones(len(rows), np.uint32)
    matrix = ss.coo_matrix((ones, (rows, cols)))
    return matrix

Additionally, I converted the string names of the nodes to integers. Thus data['agn'] becomes [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4] and data['fct'] becomes [0, 1, 2, 3, 4, 2, 3, 4, 5, 4, 5, 6, 6, 7, 8].

I get this sparse matrix:

(0, 0) 1 (0, 1) 1 (0, 2) 1 (0, 3) 1 (0, 4) 1 (1, 2) 1 (1, 3) 1 (1, 4) 1 (1, 5) 1 (2, 4) 1 (2, 5) 1 (2, 6) 1 (3, 6) 1 (3, 7) 1 (4, 8) 1

Upvotes: 1

Related Questions