Reputation: 51
I have a –large– dataframe with a list of edges in a bipartite graph. I want transforme it to a python sparse transition matrix.
So I have a dataframe with a list of edges linking nodes from part 1 (a,b,c) with part (x,y,z). Edges have multiplicity: in the example, there are two edges from b to y.
start end multiplicity
a x 1
a y 1
b y 2
b z 1
c x 1
c z 1
The result I want is a sparse matrix, 3x3 in this case. I have dictionaries for part 1 and 2, indicating which node corresponds to which row and columns of the resulting transition matrix:
dic1 = {'a':0,'b':1,'c':2}
dic2 = {'x':1,'y':0,'z':2}
So I want the matrix
y x z
a 1 1 0
b 2 0 1
c 0 1 1
...but in sparse (csr_matrix, lil_matrix, or coo_matrix). I have tried iterating over the list of edges, but it is too slow for long lists. Also, approaches based on pivot will generate full matrices, which will be slow and memory consumming. Is there an efficient way to obtain the sparse matrix I want
Upvotes: 3
Views: 373
Reputation: 75110
From what I understand , you can try pivot
+ reindex
with Index.map
(I have added 2 variables m
and final
for readability which you can replace with one after testing):
m = df.pivot(*df).fillna(0).rename_axis(index=None,columns=None)
final = m.reindex(index=m.index[m.index.map(dic1)],columns=m.columns[m.columns.map(dic2)])
print(final)
y x z
a 1.0 1.0 0.0
b 2.0 0.0 1.0
c 0.0 1.0 1.0
Upvotes: 2