Reputation: 31186
Given an adjacency list:
adj_list = [array([0,1]),array([0,1,2]),array([0,2])]
And an array of indices,
ind_arr = array([0,1,2])
Goal:
A = np.zeros((3,3))
for i in ind_arr:
A[i,list(adj_list[x])] = 1.0/float(adj_list[x].shape[0])
Currently, I have written:
A[ind_list[:],adj_list[:]] = 1. / len(adj_list[:])
And tried various configurations of indexing within this scaffold.
Upvotes: 1
Views: 152
Reputation: 221504
Here's one approach -
lens = np.array([len(i) for i in adj_list])
col_idx = np.concatenate(adj_list)
out = np.zeros((len(lens), col_idx.max()+1))
row_idx = np.repeat(np.arange(len(lens)), lens)
vals = np.repeat(1.0/lens, lens)
out[row_idx, col_idx] = vals
Sample input, output -
In [494]: adj_list = [np.array([0,2]),np.array([0,1,4])]
In [496]: out
Out[496]:
array([[ 0.5 , 0. , 0.5 , 0. , 0. ],
[ 0.33333333, 0.33333333, 0. , 0. , 0.33333333]])
Sparse matrix as output
Additionally, if you want to save memory and create a sparse matrix instead, that's an easy extension -
In [506]: from scipy.sparse import csr_matrix
In [507]: csr_matrix((vals, (row_idx, col_idx)), shape=(len(lens), col_idx.max()+1))
Out[507]:
<2x5 sparse matrix of type '<type 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Row format>
In [508]: _.toarray()
Out[508]:
array([[ 0.5 , 0. , 0.5 , 0. , 0. ],
[ 0.33333333, 0.33333333, 0. , 0. , 0.33333333]])
Upvotes: 1
Reputation: 214927
I don't think you can completely eliminate loops due to the mixed data types, but you can reduce the nested double for loops to a single one:
A = np.zeros((2, 3))
for i, arr in enumerate(adj_list):
arr_size = len(arr)
A[i, :arr_size] = 1./arr_size
A
# array([[ 0.5 , 0.5 , 0. ],
# [ 0.33333333, 0.33333333, 0.33333333]])
Or if the numbers in the arrays are actually columns positions:
A = np.zeros((2, 3))
for i, arr in enumerate(adj_list):
A[i, arr] = 1./len(arr)
A
# array([[ 0.5 , 0.5 , 0. ],
# [ 0.33333333, 0.33333333, 0.33333333]])
Another option using MultiLabelBinarizer
from sklearn
(but may not be as efficient):
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
adj_list = [np.array([0,1]),np.array([0,1,2])]
sizes = np.fromiter(map(len, adj_list), dtype=int)
mlb.fit_transform(adj_list)/sizes[:,None]
# array([[ 0.5 , 0.5 , 0. ],
# [ 0.33333333, 0.33333333, 0.33333333]])
Upvotes: 1