Reputation: 497
I have an array as so:
myarray = [['a', 'b', 'c'],
['b', 'c', 'd'],
['c', 'd', 'e']]
And for this, np.unique(myarray, return_counts=True)
works amazingly and gives me the desired output. However I would then like to apply it row by row, and for it to be able to tell me that in row number 1, the counts for d and e are 0.
For the moment I've been trying to add them to the array row each iteration during a for loop and then subtracting 1 to each count, but even that has me confused. I've tried these two solutions:
for i in range(mylen):
unique, counts = np.unique(np.array([list(myarray[i]), 'a', 'b', 'c', 'd', 'e']), return_counts=True) # attempt 1
unique, counts = np.unique(np.vstack((myarray[i], 'a', 'b', 'c', 'd', 'e')), return_counts=True) # attempt 2
But neither works. Does anyone have an elegant solution? This will be used for thousands, perhaps millions, of values, so computation time is somewhat relevant to the discussion.
Upvotes: 2
Views: 1429
Reputation: 114320
You can use np.unique
with return_inverse=True
to get what you want:
letters, inv = np.unique(myarray, return_inverse=True)
inv = inv.reshape(myarray.shape)
inv
is now
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4]], dtype=int64)
You can get counts of all the unique elements in one line:
>>> (inv == np.arange(len(letters)).reshape(-1, 1, 1)).sum(-1)
array([[1, 0, 0],
[1, 1, 0],
[1, 1, 1],
[0, 1, 1],
[0, 0, 1]])
The first dimension corresponds to the letter in letters
, the second to the row number, since sum(-1)
sums across the columns. You can get counts for the columns using sum(1)
instead. In your symmetrical example, the result will be identical.
No looping, no np.apply_along_axis
(which is a glorified loop), all vectorized. Here is a quick timing test:
np.random.seed(42)
myarray = np.random.choice(list(string.ascii_lowercase), size=(100, 100))
def Epsi95(arr):
uniques = np.unique(arr)
def fun(x):
base_dict = dict(zip(uniques, [0]*uniques.shape[0]))
base_dict.update(dict(zip(*np.unique(x, return_counts=True))))
return [i[-1] for i in sorted(base_dict.items())]
return np.apply_along_axis(fun, 1, arr)
def MadPhysicist(myarray):
letters, inv = np.unique(myarray, return_inverse=True)
inv = inv.reshape(myarray.shape)
return (inv == np.arange(len(letters)).reshape(-1, 1, 1)).sum(-1)
%timeit Epsi95(myarray)
6.37 ms ± 26.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit MadPhysicist(myarray)
1.28 ms ± 6.85 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 2
Reputation: 9047
myarray = [['a', 'b', 'c'],
['b', 'c', 'd'],
['c', 'd', 'e']]
arr = np.array(myarray)
uniques = np.unique(arr)
def fun(x):
base_dict = dict(zip(uniques, [0]*uniques.shape[0]))
base_dict.update(dict(zip(*np.unique(x, return_counts=True))))
return [i[-1] for i in sorted(base_dict.items())]
np.apply_along_axis(fun, 1, arr)
# array([[1, 1, 1, 0, 0], # a=1 b=1 c=1 d=0 e=0
# [0, 1, 1, 1, 0],
# [0, 0, 1, 1, 1]], dtype=int64)
Upvotes: 1
Reputation: 1005
You can iterate over the rows of the list and then by the unique values of the entire set. Giving an example below, and this can be used to insert the elements into a dictionary or any other structure of your choosing.
Example:
import numpy as np
myarray = [['a', 'b', 'c'],
['b', 'c', 'd'],
['c', 'd', 'e']]
uniq = np.unique(np.array(myarray))
for idx, row in enumerate(myarray):
for x in uniq:
print(f"Row {idx} Element ({x}) Count: {row.count(x)}")
Output:
Row 0 Element (a) Count: 1
Row 0 Element (b) Count: 1
Row 0 Element (c) Count: 1
Row 0 Element (d) Count: 0
Row 0 Element (e) Count: 0
Row 1 Element (a) Count: 0
Row 1 Element (b) Count: 1
Row 1 Element (c) Count: 1
Row 1 Element (d) Count: 1
Row 1 Element (e) Count: 0
Row 2 Element (a) Count: 0
Row 2 Element (b) Count: 0
Row 2 Element (c) Count: 1
Row 2 Element (d) Count: 1
Row 2 Element (e) Count: 1
To use a list of dictionaries for each row:
import numpy as np
myarray = [['a', 'b', 'c'],
['b', 'c', 'd'],
['c', 'd', 'e']]
uniq = np.unique(np.array(myarray))
row_vals = []
for idx, row in enumerate(myarray):
dict = {}
for x in uniq:
dict[x] = row.count(x)
row_vals.append(dict)
for r in row_vals:
print(r)
Output:
{'a': 1, 'b': 1, 'c': 1, 'd': 0, 'e': 0}
{'a': 0, 'b': 1, 'c': 1, 'd': 1, 'e': 0}
{'a': 0, 'b': 0, 'c': 1, 'd': 1, 'e': 1}
Upvotes: 0