Reputation: 33

How do you efficiently sum the occurences of a value in one array at positions in another array

Im looking for an efficient 'for loop' avoiding solution that solves an array related problem I'm having. I want to use a huge 1Darray (A -> size = 250.000) of values between 0 and 40 for indexing in one dimension, and a array (B) with the same size with values between 0 and 9995 for indexing in a second dimension.

The result should be an array with size (41, 9996) with for each index the amount of times that any value from array 1 occurs at a value from array 2.

Example:

A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]
which should result in:
[[0, 1, 0,
 [0, 0, 0,
 [0, 0, 1,
 [0, 0, 2,
 [1, 0, 0]]

The dirty way is too slow as the amount of data is huge, what you would be able to do is:

out = np.zeros(41,9995)
for i in A:
  for j in B:
     out[i,j] += 1

which will take 238.000 * 238.000 loops... I've tried this, which works partially:

out = np.zeros(41,9995)
out[A,B] += 1

Which generates a result with 1 everywhere, regardless of the amount of times the values occur.

Does anyone have a clue how to fix this? Thanks in advance!

Upvotes: 3

Answers (3)

Divakar

Reputation: 221614

Given that the numbers are in a small range, bincount would be a good choice for bin-based summing -

def accumulate_coords(A,B):
    nrows = A.max()+1
    ncols = B.max()+1
    return np.bincount(A*ncols+B,minlength=nrows*ncols).reshape(-1,ncols)

Sample run -

In [55]: A
Out[55]: array([0, 3, 2, 4, 3])

In [56]: B
Out[56]: array([1, 2, 2, 0, 2])

In [58]: accumulate_coords(A,B)
Out[58]: 
array([[0, 1, 0],
       [0, 0, 0],
       [0, 0, 1],
       [0, 0, 2],
       [1, 0, 0]])

Upvotes: 1

Shai

Reputation: 114866

You are looking for a sparse tensor:

import torch

A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]
idx = torch.LongTensor([A, B])
torch.sparse.FloatTensor(idx, torch.ones(idx.shape[1]), torch.Size([5,3])).to_dense()

Output:

tensor([[0., 1., 0.],
        [0., 0., 0.],
        [0., 0., 1.],
        [0., 0., 2.],
        [1., 0., 0.]])

You can also do the same with scipy sparse matrix:

import numpy as np
from scipy.sparse import coo_matrix

coo_matrix((np.ones(len(A)), (np.array(A), np.array(B))), shape=(5,3)).toarray()

output:

array([[0., 1., 0.],
       [0., 0., 0.],
       [0., 0., 1.],
       [0., 0., 2.],
       [1., 0., 0.]])

Sometimes it is better to leave the matrix in its sparse representation, rather than forcing it to be "dense" again.

Upvotes: 3

Dani Mesejo

Reputation: 61910

Use numpy.add.at:

import numpy as np

A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]

arr = np.zeros((5, 3))
np.add.at(arr, (A, B), 1)

print(arr)

Output

[[0. 1. 0.]
 [0. 0. 0.]
 [0. 0. 1.]
 [0. 0. 2.]
 [1. 0. 0.]]

Upvotes: 3

How do you efficiently sum the occurences of a value in one array at positions in another array

Answers (3)

Related Questions