Reputation: 3043
I have two numpy arrays whose entries are strings. The first array (array1
) is of shape ( m, n )
where m>1 and n>1. The second array (array2
) is of shape (p, )
, where p is an integer greater than 1. Entries in array2 are not repeated (i.e. they are unique), while array1 is likely to have multiple instances of same strings.
I want to replace array1 with another array of the same shape (as array1), by including indices (numbers) in place of strings. These indices are obtained by comparing the entries of array1 with array2. Each entry of array1 will surely match with some entry of array2.
Speed is of importance here, and I want to find the fastest way of doing this.
Here is a small example:
import numpy as np
array1 = np.asarray([['aa', 'cc', 'bb', 'aa', 'aa', 'bb'],
['cc', 'bb', 'cc', 'bb', 'aa', 'aa'],
['bb', 'cc', 'aa', 'aa', 'bb', 'cc']])
array2 = np.asarray(['aa', 'bb', 'cc'])
This is how I am approaching the problem for now:
for k in range(array1.shape[0]):
array1[k] = np.asarray([j for i in range(array1.shape[1]) for j in range(len(array2)) if array1[k,i]==array2[j]])
print array1
[['0' '2' '1' '0' '0' '1']
['2' '1' '2' '1' '0' '0']
['1' '2' '0' '0' '1' '2']]
But, when I work with array1 with huge numbers of rows and columns, I find that the above mentioned way is not very fast.
What may be a faster way of achieving the task that I desire?
Upvotes: 0
Views: 47
Reputation: 61910
A possible alternative:
import numpy as np
array1 = np.asarray([['aa', 'cc', 'bb', 'aa', 'aa', 'bb'],
['cc', 'bb', 'cc', 'bb', 'aa', 'aa'],
['bb', 'cc', 'aa', 'aa', 'bb', 'cc']])
array2 = np.asarray(['aa', 'bb', 'cc'])
d = {v: k for k, v in enumerate(array2)}
result = np.vectorize(d.get)(array1)
print(result)
Output
[[0 2 1 0 0 1]
[2 1 2 1 0 0]
[1 2 0 0 1 2]]
Upvotes: 1
Reputation: 221524
With all entries from array2
present in array
, we can use np.searchsorted
-
sidx = array2.argsort()
out = sidx[np.searchsorted(array2,array1.ravel(),sorter=sidx).reshape(array1.shape)]
If array2
is already sorted, we can skip argsort
and corresponding indexing step -
out = np.searchsorted(array2,array1.ravel()).reshape(array1.shape)
Upvotes: 3