Reputation: 33
Im looking for an efficient 'for loop' avoiding solution that solves an array related problem I'm having. I want to use a huge 1Darray (A -> size = 250.000)
of values between 0 and 40 for indexing in one dimension, and a array (B) with the same size with values between 0 and 9995 for indexing in a second dimension.
The result should be an array with size (41, 9996) with for each index the amount of times that any value from array 1 occurs at a value from array 2.
Example:
A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]
which should result in:
[[0, 1, 0,
[0, 0, 0,
[0, 0, 1,
[0, 0, 2,
[1, 0, 0]]
The dirty way is too slow as the amount of data is huge, what you would be able to do is:
out = np.zeros(41,9995)
for i in A:
for j in B:
out[i,j] += 1
which will take 238.000 * 238.000 loops... I've tried this, which works partially:
out = np.zeros(41,9995)
out[A,B] += 1
Which generates a result with 1 everywhere, regardless of the amount of times the values occur.
Does anyone have a clue how to fix this? Thanks in advance!
Upvotes: 3
Views: 104
Reputation: 221614
Given that the numbers are in a small range, bincount
would be a good choice for bin-based summing -
def accumulate_coords(A,B):
nrows = A.max()+1
ncols = B.max()+1
return np.bincount(A*ncols+B,minlength=nrows*ncols).reshape(-1,ncols)
Sample run -
In [55]: A
Out[55]: array([0, 3, 2, 4, 3])
In [56]: B
Out[56]: array([1, 2, 2, 0, 2])
In [58]: accumulate_coords(A,B)
Out[58]:
array([[0, 1, 0],
[0, 0, 0],
[0, 0, 1],
[0, 0, 2],
[1, 0, 0]])
Upvotes: 1
Reputation: 114866
You are looking for a sparse tensor:
import torch
A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]
idx = torch.LongTensor([A, B])
torch.sparse.FloatTensor(idx, torch.ones(idx.shape[1]), torch.Size([5,3])).to_dense()
Output:
tensor([[0., 1., 0.], [0., 0., 0.], [0., 0., 1.], [0., 0., 2.], [1., 0., 0.]])
You can also do the same with scipy sparse matrix:
import numpy as np
from scipy.sparse import coo_matrix
coo_matrix((np.ones(len(A)), (np.array(A), np.array(B))), shape=(5,3)).toarray()
output:
array([[0., 1., 0.], [0., 0., 0.], [0., 0., 1.], [0., 0., 2.], [1., 0., 0.]])
Sometimes it is better to leave the matrix in its sparse representation, rather than forcing it to be "dense" again.
Upvotes: 3
Reputation: 61910
Use numpy.add.at:
import numpy as np
A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]
arr = np.zeros((5, 3))
np.add.at(arr, (A, B), 1)
print(arr)
Output
[[0. 1. 0.]
[0. 0. 0.]
[0. 0. 1.]
[0. 0. 2.]
[1. 0. 0.]]
Upvotes: 3