Whitehot
Whitehot

Reputation: 497

Numpy unique: count for values also not in array?

I have an array as so:

myarray = [['a', 'b', 'c'],
           ['b', 'c', 'd'],
           ['c', 'd', 'e']]

And for this, np.unique(myarray, return_counts=True) works amazingly and gives me the desired output. However I would then like to apply it row by row, and for it to be able to tell me that in row number 1, the counts for d and e are 0.

For the moment I've been trying to add them to the array row each iteration during a for loop and then subtracting 1 to each count, but even that has me confused. I've tried these two solutions:

for i in range(mylen):
    unique, counts = np.unique(np.array([list(myarray[i]), 'a', 'b', 'c', 'd', 'e']), return_counts=True) # attempt 1
    unique, counts = np.unique(np.vstack((myarray[i], 'a', 'b', 'c', 'd', 'e')), return_counts=True) # attempt 2

But neither works. Does anyone have an elegant solution? This will be used for thousands, perhaps millions, of values, so computation time is somewhat relevant to the discussion.

Upvotes: 2

Views: 1429

Answers (3)

Mad Physicist
Mad Physicist

Reputation: 114320

You can use np.unique with return_inverse=True to get what you want:

letters, inv = np.unique(myarray, return_inverse=True)
inv = inv.reshape(myarray.shape)

inv is now

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]], dtype=int64)

You can get counts of all the unique elements in one line:

>>> (inv == np.arange(len(letters)).reshape(-1, 1, 1)).sum(-1)
array([[1, 0, 0],
       [1, 1, 0],
       [1, 1, 1],
       [0, 1, 1],
       [0, 0, 1]])

The first dimension corresponds to the letter in letters, the second to the row number, since sum(-1) sums across the columns. You can get counts for the columns using sum(1) instead. In your symmetrical example, the result will be identical.

No looping, no np.apply_along_axis (which is a glorified loop), all vectorized. Here is a quick timing test:

np.random.seed(42)
myarray = np.random.choice(list(string.ascii_lowercase), size=(100, 100))

def Epsi95(arr):
    uniques = np.unique(arr)
    def fun(x):
        base_dict = dict(zip(uniques, [0]*uniques.shape[0]))
        base_dict.update(dict(zip(*np.unique(x, return_counts=True))))
        return [i[-1] for i in sorted(base_dict.items())]
    return np.apply_along_axis(fun, 1, arr)

def MadPhysicist(myarray):
    letters, inv = np.unique(myarray, return_inverse=True)
    inv = inv.reshape(myarray.shape)
    return (inv == np.arange(len(letters)).reshape(-1, 1, 1)).sum(-1)    

%timeit Epsi95(myarray)
6.37 ms ± 26.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit MadPhysicist(myarray)
1.28 ms ± 6.85 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Upvotes: 2

Epsi95
Epsi95

Reputation: 9047

myarray = [['a', 'b', 'c'],
           ['b', 'c', 'd'],
           ['c', 'd', 'e']]

arr = np.array(myarray)

uniques = np.unique(arr)

def fun(x):
    base_dict = dict(zip(uniques, [0]*uniques.shape[0]))
    base_dict.update(dict(zip(*np.unique(x, return_counts=True))))
    return [i[-1] for i in sorted(base_dict.items())]

np.apply_along_axis(fun, 1, arr)

# array([[1, 1, 1, 0, 0], # a=1 b=1 c=1 d=0 e=0
#        [0, 1, 1, 1, 0],
#        [0, 0, 1, 1, 1]], dtype=int64)

Upvotes: 1

Abstract
Abstract

Reputation: 1005

You can iterate over the rows of the list and then by the unique values of the entire set. Giving an example below, and this can be used to insert the elements into a dictionary or any other structure of your choosing.

Example:

import numpy as np

myarray = [['a', 'b', 'c'],
           ['b', 'c', 'd'],
           ['c', 'd', 'e']]

uniq = np.unique(np.array(myarray))

for idx, row in enumerate(myarray):
    for x in uniq:
        print(f"Row {idx} Element ({x}) Count: {row.count(x)}")

Output:

Row 0 Element (a) Count: 1
Row 0 Element (b) Count: 1
Row 0 Element (c) Count: 1
Row 0 Element (d) Count: 0
Row 0 Element (e) Count: 0
Row 1 Element (a) Count: 0
Row 1 Element (b) Count: 1
Row 1 Element (c) Count: 1
Row 1 Element (d) Count: 1
Row 1 Element (e) Count: 0
Row 2 Element (a) Count: 0
Row 2 Element (b) Count: 0
Row 2 Element (c) Count: 1
Row 2 Element (d) Count: 1
Row 2 Element (e) Count: 1

To use a list of dictionaries for each row:

import numpy as np

myarray = [['a', 'b', 'c'],
           ['b', 'c', 'd'],
           ['c', 'd', 'e']]

uniq = np.unique(np.array(myarray))
row_vals = []

for idx, row in enumerate(myarray):
    dict = {}
    for x in uniq:
        dict[x] = row.count(x)
    row_vals.append(dict)

for r in row_vals:
    print(r)

Output:

{'a': 1, 'b': 1, 'c': 1, 'd': 0, 'e': 0}
{'a': 0, 'b': 1, 'c': 1, 'd': 1, 'e': 0}
{'a': 0, 'b': 0, 'c': 1, 'd': 1, 'e': 1}

Upvotes: 0

Related Questions