user50781
user50781

Reputation: 51

Convert pandas list of edges to sparse transition matrix with dictionary for nodes row and column positions?

I have a –large– dataframe with a list of edges in a bipartite graph. I want transforme it to a python sparse transition matrix.

So I have a dataframe with a list of edges linking nodes from part 1 (a,b,c) with part (x,y,z). Edges have multiplicity: in the example, there are two edges from b to y.

start  end  multiplicity
    a    x             1
    a    y             1
    b    y             2
    b    z             1
    c    x             1
    c    z             1

The result I want is a sparse matrix, 3x3 in this case. I have dictionaries for part 1 and 2, indicating which node corresponds to which row and columns of the resulting transition matrix:

dic1 = {'a':0,'b':1,'c':2}
dic2 = {'x':1,'y':0,'z':2}

So I want the matrix

  y x z
a 1 1 0
b 2 0 1
c 0 1 1

...but in sparse (csr_matrix, lil_matrix, or coo_matrix). I have tried iterating over the list of edges, but it is too slow for long lists. Also, approaches based on pivot will generate full matrices, which will be slow and memory consumming. Is there an efficient way to obtain the sparse matrix I want

Upvotes: 3

Views: 373

Answers (1)

anky
anky

Reputation: 75110

From what I understand , you can try pivot + reindex with Index.map (I have added 2 variables m and final for readability which you can replace with one after testing):

m = df.pivot(*df).fillna(0).rename_axis(index=None,columns=None)
final = m.reindex(index=m.index[m.index.map(dic1)],columns=m.columns[m.columns.map(dic2)])

print(final)

     y    x    z
a  1.0  1.0  0.0
b  2.0  0.0  1.0
c  0.0  1.0  1.0

Upvotes: 2

Related Questions