trypython
trypython

Reputation: 33

Vectorize simple for loop in numpy

I'm pretty new to numpy and I'm trying to vectorize a simple for loop for performance reasons, but I can't seem to come up with a solution. I have a numpy array with unique words and for each of these words i need the number of times they occur in another numpy array, called array_to_compare. The number is passed to a third numpy array, which has the same shape as the unique words array. Here is the code which contains the for loop:

import numpy as np

unique_words = np.array(['a', 'b', 'c', 'd'])
array_to_compare = np.array(['a', 'b', 'a', 'd'])
vector_array = np.zeros(len(unique_words))

for word in np.nditer(unique_words):
    counter = np.count_nonzero(array_to_compare == word)
    vector_array[np.where(unique_words == word)] = counter

vector_array = [2. 1. 0. 1.]    #the desired output

I tried it with np.where and np.isin, but did not get the desired result. I am thankful for any help!

Upvotes: 2

Views: 649

Answers (3)

Daniel Lenz
Daniel Lenz

Reputation: 3857

I'd probably use a Counter and a list comprehension to solve this:

In [1]: import numpy as np
   ...:
   ...: unique_words = np.array(['a', 'b', 'c', 'd'])
   ...: array_to_compare = np.array(['a', 'b', 'a', 'd'])

In [2]: from collections import Counter

In [3]: counter = Counter(array_to_compare)

In [4]: counter
Out[4]: Counter({'a': 2, 'b': 1, 'd': 1})

In [5]: vector_array = np.array([counter[key] for key in unique_words])

In [6]: vector_array
Out[6]: array([2, 1, 0, 1])

Assembling the Counter is done in linear time and iterating through your unique_words is also linear.

Upvotes: 2

hpaulj
hpaulj

Reputation: 231335

A numpy comparison of array values using broadcasting:

In [76]: unique_words[:,None]==array_to_compare
Out[76]: 
array([[ True, False,  True, False],
       [False,  True, False, False],
       [False, False, False, False],
       [False, False, False,  True]])
In [77]: (unique_words[:,None]==array_to_compare).sum(1)
Out[77]: array([2, 1, 0, 1])

In [78]: timeit (unique_words[:,None]==array_to_compare).sum(1)
9.5 µs ± 2.79 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

But Counter is also a good choice:

In [72]: %%timeit
    ...: c=Counter(array_to_compare)
    ...: [c[key] for key in unique_words]
12.7 µs ± 30.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Your use of count_nonzero can be improved with

In [73]: %%timeit
    ...: words=unique_words.tolist()
    ...: vector_array = np.zeros(len(words))
    ...: for i,word in enumerate(words):
    ...:     counter = np.count_nonzero(array_to_compare == word)
    ...:     vector_array[i] = counter
    ...: 
23.4 µs ± 505 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Iteration on lists is faster than on arrays (nditer doesn't add much). And enumerate lets us skip the where test.

Upvotes: 1

Kraigolas
Kraigolas

Reputation: 5560

Similar to @DanielLenz's answer, but using np.unique to create a dict:

import numpy as np
unique_words = np.array(['a', 'b', 'c', 'd'])
array_to_compare = np.array(['a', 'b', 'a', 'd'])
counts = dict(zip(*np.unique(array_to_compare, return_counts=True)))
result = np.array([counts[word] if word in counts else 0 for word in unique_words])
[2 1 0 1]

Upvotes: 1

Related Questions